430 lines
24 KiB
Markdown
430 lines
24 KiB
Markdown
# CCLI SongSelect Partner API — Doc Pointer
|
|
|
|
## Where to get the docs
|
|
|
|
**Postman documentation** (only public source, no PDF/OpenAPI mirror):
|
|
https://documenter.getpostman.com/view/604633/TzseGkmA
|
|
|
|
The page is JS-rendered. Two ways to read it:
|
|
1. Open in a browser (Chrome/Firefox), wait for the Postman documenter to render.
|
|
2. Click the "Run in Postman" button top-right to import the full collection + environment into a Postman workspace — then inspect every endpoint, params, headers, sample requests/responses.
|
|
|
|
The collection name is **"SongSelect Partner API"** under owner id `604633`.
|
|
|
|
## Status (read first!)
|
|
|
|
> **NOTICE: CCLI has retired the SongSelect API Partner Program and is no longer accepting new API partners.**
|
|
|
|
Existing partners keep working. New access requires contacting CCLI directly (`partners@ccli.com` / regional CCLI office) to request reinstatement or special arrangement.
|
|
|
|
## Key facts (from the docs)
|
|
|
|
- **Auth**: OpenID Connect / OAuth 2.0, **Authorization Code with PKCE**, refresh tokens supported
|
|
- Authorize: `https://identityservices.ccli.com/connect/authorize`
|
|
- Token: `https://identityservices.ccli.com/connect/token`
|
|
- Scope: `openid cclipartnerapi.read offline_access`
|
|
- **Subscription Key**: every request needs header `Ocp-Apim-Subscription-Key: <key>` (dev key for testing, prod key for live)
|
|
- **Tokens**: access token 1h, refresh token 60-day sliding (one-time use, new refresh returned on each refresh)
|
|
- **Rate limits**: 100 calls / 10s short term, 300 calls / 5min long term. `429` returns JSON `{statusCode, message}`.
|
|
- **Dev restrictions**: dev client only sees content for users linked to the "SongSelect API <country> Partners" test organization.
|
|
- Endpoint reference (search, song detail, lyrics, chord chart, etc.) lives inside the Postman collection — load it to see exact paths/params, not summarized in the public preview.
|
|
|
|
## Credentials needed before coding
|
|
|
|
1. CCLI Partner ClientId + ClientSecret
|
|
2. Development Subscription Key (Ocp-Apim-Subscription-Key)
|
|
3. Production Subscription Key (later)
|
|
4. A CCLI user account linked to the Partner test organization (for dev refresh-token bootstrap)
|
|
|
|
Store in `.env`:
|
|
```
|
|
CCLI_PARTNER_CLIENT_ID=
|
|
CCLI_PARTNER_CLIENT_SECRET=
|
|
CCLI_PARTNER_SUBSCRIPTION_KEY_DEV=
|
|
CCLI_PARTNER_SUBSCRIPTION_KEY_PROD=
|
|
CCLI_PARTNER_REDIRECT_URI=https://pp-planer.ddev.site/oauth/ccli/callback
|
|
```
|
|
|
|
## Bootstrap flow for a new agent
|
|
|
|
1. Load Postman collection from URL above → list every endpoint with its path, params, sample response.
|
|
2. Mirror existing `ChurchToolsService` pattern (`app/Services/ChurchToolsService.php`) — closure-injectable fetcher, `logApiCall`, `classifyError`, German error messages, `ApiRequestLog` row per call.
|
|
3. Implement OAuth2 PKCE handshake → persist refresh token (encrypted) in a `ccli_tokens` table. Auto-refresh on 401.
|
|
4. Always send `Ocp-Apim-Subscription-Key` header alongside `Authorization: Bearer <access_token>`.
|
|
5. Respect rate limits (Laravel `RateLimiter::for('ccli', ...)` with 100/10s + 300/5min buckets).
|
|
6. Map result to existing schema: `Song.ccli_id`, arrangements + global `Label`s (Strophe 1 / Refrain / Bridge), `SongSlide.text_content`. See `ProImportService::upsertSong` for the upsert template.
|
|
|
|
## Fallback if API access denied
|
|
|
|
- Manual paste flow → parser splits on `Verse N`, `Chorus`, `Bridge`, `Pre-Chorus`, `Tag`, `Ending` headings.
|
|
- `.pro` import already implemented (`POST /api/songs/import-pro`).
|
|
|
|
---
|
|
|
|
# Alternative: Headless-browser scraping (NO official API)
|
|
|
|
Use this when the Partner API is not available (current default for new projects). It drives `songselect.ccli.com` with a real browser session using a normal CCLI SongSelect subscription. Same data the user would download manually, just automated.
|
|
|
|
## ToS / legal note
|
|
|
|
CCLI's SongSelect ToS forbids "automated retrieval" without partner agreement. A church-internal tool that only acts on behalf of an authenticated subscriber and respects rate limits is a gray area many open-source projects (OpenLP, FreeShow community fork, `gwonamfromkoradai/SongSelectSave`) operate in. Document the risk in `README` and let the church decide.
|
|
|
|
## Required credentials
|
|
|
|
```
|
|
CCLI_SONGSELECT_USER= # CCLI account email
|
|
CCLI_SONGSELECT_PASSWORD= # CCLI account password
|
|
CCLI_SONGSELECT_BASE_URL=https://songselect.ccli.com
|
|
```
|
|
|
|
Single shared app account (chosen). Encrypt the password at rest (`Crypt::encryptString`) — never log it.
|
|
|
|
## Tech stack pick
|
|
|
|
Three viable headless-browser options for Laravel:
|
|
|
|
| Tool | Pros | Cons |
|
|
|---|---|---|
|
|
| **`spatie/browsershot`** (Puppeteer + Chromium via Node) | Already in Laravel ecosystem; simple PHP API; supports cookies, headers, screenshots | Heavyweight; needs Node + Chromium in container |
|
|
| **`laravel/dusk`** (ChromeDriver) | Pure Laravel; auth helpers; assertion DSL | Built for testing, awkward for prod scraping |
|
|
| **Playwright via Node side-script** (`tests/e2e` already uses it) | Best automation API; persistent storage state; identical to existing E2E setup | Crosses PHP↔Node boundary (CLI exec or queue worker) |
|
|
|
|
**Recommendation: Playwright** — already a dev dep, `tests/e2e/auth.setup.ts` proves the pattern. Run as a queue job that shells out to a Node script, returns JSON.
|
|
|
|
DDEV needs Chromium installed — add to `.ddev/web-build/Dockerfile.example`:
|
|
```dockerfile
|
|
RUN apt-get update && apt-get install -y chromium fonts-liberation
|
|
RUN npx --yes playwright install --with-deps chromium
|
|
```
|
|
|
|
## Endpoints / DOM contract (observed)
|
|
|
|
These are not an "API" — they are URL + selector contracts that can change. Re-verify quarterly.
|
|
|
|
### 1. Login
|
|
- URL: `https://profile.ccli.com/account/signin?appContext=SongSelect`
|
|
- Form fields: `input[name="EmailAddress"]`, `input[name="Password"]`, `button[type="submit"]`
|
|
- Success: redirect to `https://songselect.ccli.com/`
|
|
- Persist cookies (`profile.ccli.com`, `songselect.ccli.com`) in `storage/app/ccli/state.json` (Playwright `storageState`). Re-login when cookies expire.
|
|
|
|
### 2. Search by keyword
|
|
- URL: `https://songselect.ccli.com/search/results?Keyword={url-encoded-query}`
|
|
- Result rows: `.song-result` (or current class — verify with DevTools)
|
|
- Fields per row: `.song-title a` (link + title), `.song-authors` (authors), `.song-ccli-number` or attribute `data-id` (CCLI #)
|
|
- Pagination: `?Keyword=...&CurrentPage=2`
|
|
|
|
### 3. Search by CCLI number
|
|
- URL: `https://songselect.ccli.com/Songs/{ccliId}` → redirects to canonical song page
|
|
|
|
### 4. Song detail
|
|
- URL: `https://songselect.ccli.com/Songs/{ccliId}/{slug}`
|
|
- Metadata in `<dl>` or schema.org JSON-LD `<script type="application/ld+json">` (preferred — stable):
|
|
- `name` → title
|
|
- `author[].name` → authors
|
|
- `copyrightYear`, `copyrightHolder`
|
|
- Themes / publishers in side panel.
|
|
|
|
### 5. Lyrics download (the "parts" the user wants)
|
|
- URL: `https://songselect.ccli.com/Songs/{ccliId}/{slug}/viewlyrics`
|
|
- Trigger: click `#lyricsDownloadButton` (gives `.txt`) OR fetch hidden link `a[data-download-format="txt"]`
|
|
- The `.txt` payload is **structured by part**, e.g.:
|
|
```
|
|
Verse 1
|
|
Amazing grace, how sweet the sound
|
|
...
|
|
|
|
Chorus
|
|
My chains are gone...
|
|
|
|
Verse 2
|
|
...
|
|
|
|
Bridge
|
|
...
|
|
|
|
CCLI Song # 22025
|
|
© Public Domain
|
|
CCLI License # 12345
|
|
```
|
|
- Headers to detect (regex): `^(Verse \d+|Chorus( \d+)?|Pre-Chorus|Bridge( \d+)?|Tag|Ending|Intro|Interlude|Refrain|Coda)\s*$`
|
|
|
|
### 6. ChordPro download (optional, if account has chord access)
|
|
- URL: `https://songselect.ccli.com/Songs/{ccliId}/{slug}/chordpro` → click `.chordpro-download`
|
|
- Format is industry-standard ChordPro — easier to parse than HTML.
|
|
|
|
## Mapping to existing schema
|
|
|
|
```
|
|
SongSelect part header → global Label name
|
|
─────────────────────────────────────────────
|
|
Verse N → Strophe N
|
|
Chorus / Refrain → Refrain
|
|
Pre-Chorus → Pre-Refrain
|
|
Bridge → Bridge
|
|
Tag / Ending / Coda → Outro
|
|
Intro / Interlude → Intro / Zwischenspiel
|
|
```
|
|
|
|
Lookup labels case-insensitive (`SongService::createDefaultGroups` already does `LOWER(name)`); create new global label if no match.
|
|
|
|
Persistence template (mirror `ProImportService::upsertSong`):
|
|
1. `Song::firstOrNew(['ccli_id' => $ccliId])` — restore soft-deleted via `restore()`
|
|
2. Update title / author / copyright_text / copyright_year / publisher
|
|
3. Wipe existing arrangements for clean re-import (or skip if user opted "merge")
|
|
4. Create one `SongArrangement(name='Normal', is_default=true)`
|
|
5. For each parsed part → find/create `Label`, create `SongSlide(label_id, order, text_content)`, attach via `SongArrangementLabel(order)`
|
|
|
|
## Service skeleton
|
|
|
|
```php
|
|
// app/Services/SongSelectScraperService.php
|
|
final class SongSelectScraperService
|
|
{
|
|
public function __construct(
|
|
private readonly SongImportService $importer,
|
|
) {}
|
|
|
|
public function search(string $query): Collection { /* runs node script: search */ }
|
|
|
|
public function fetchByCcliId(int $ccliId): array { /* runs node script: detail+lyrics */ }
|
|
|
|
public function importToDb(int $ccliId): Song
|
|
{
|
|
$payload = $this->fetchByCcliId($ccliId);
|
|
return $this->importer->upsertFromSongSelect($payload); // mirrors ProImportService
|
|
}
|
|
}
|
|
```
|
|
|
|
Run scraper inside a queue job (`ScrapeSongSelectJob`) — never block HTTP request. Frontend polls or uses Inertia partial reload.
|
|
|
|
## Node side-script (Playwright)
|
|
|
|
`scripts/songselect-fetch.mjs`:
|
|
```js
|
|
import { chromium } from 'playwright';
|
|
import fs from 'node:fs';
|
|
|
|
const [, , action, arg] = process.argv; // e.g. 'search' 'amazing grace' OR 'detail' 22025
|
|
const STATE = 'storage/app/ccli/state.json';
|
|
|
|
const browser = await chromium.launch({ headless: true });
|
|
const ctx = fs.existsSync(STATE)
|
|
? await browser.newContext({ storageState: STATE })
|
|
: await browser.newContext();
|
|
const page = await ctx.newPage();
|
|
|
|
// auto-login if cookies missing
|
|
await page.goto('https://songselect.ccli.com/');
|
|
if (await page.locator('text=Sign In').isVisible().catch(() => false)) {
|
|
await page.goto('https://profile.ccli.com/account/signin?appContext=SongSelect');
|
|
await page.fill('input[name="EmailAddress"]', process.env.CCLI_SONGSELECT_USER);
|
|
await page.fill('input[name="Password"]', process.env.CCLI_SONGSELECT_PASSWORD);
|
|
await page.click('button[type="submit"]');
|
|
await page.waitForURL('**/songselect.ccli.com/**');
|
|
await ctx.storageState({ path: STATE });
|
|
}
|
|
|
|
let result;
|
|
if (action === 'search') {
|
|
await page.goto(`https://songselect.ccli.com/search/results?Keyword=${encodeURIComponent(arg)}`);
|
|
result = await page.$$eval('.song-result', rows => rows.map(r => ({
|
|
ccli_id: r.dataset.id ?? r.querySelector('.song-ccli-number')?.textContent?.trim(),
|
|
title: r.querySelector('.song-title')?.textContent?.trim(),
|
|
authors: r.querySelector('.song-authors')?.textContent?.trim(),
|
|
url: r.querySelector('a')?.href,
|
|
})));
|
|
} else if (action === 'detail') {
|
|
await page.goto(`https://songselect.ccli.com/Songs/${arg}`);
|
|
const url = page.url();
|
|
const meta = await page.$eval('script[type="application/ld+json"]', s => JSON.parse(s.textContent));
|
|
await page.goto(url.replace(/\/?$/, '/viewlyrics'));
|
|
const lyrics = await page.locator('pre, .lyrics-content').innerText();
|
|
result = { ccli_id: arg, ...meta, lyrics };
|
|
}
|
|
|
|
console.log(JSON.stringify(result));
|
|
await browser.close();
|
|
```
|
|
|
|
PHP side calls via `Symfony\Component\Process\Process` and decodes JSON.
|
|
|
|
## Lyrics → parts parser (PHP)
|
|
|
|
```php
|
|
final class SongSelectLyricsParser
|
|
{
|
|
private const HEADER = '/^(Verse \d+|Chorus(?: \d+)?|Pre-Chorus|Bridge(?: \d+)?|Tag|Ending|Intro|Interlude|Refrain|Coda)\s*$/i';
|
|
private const LABEL_MAP = [
|
|
'verse' => 'Strophe', // suffix the number
|
|
'chorus' => 'Refrain',
|
|
'refrain' => 'Refrain',
|
|
'pre-chorus' => 'Pre-Refrain',
|
|
'bridge' => 'Bridge',
|
|
'tag' => 'Outro',
|
|
'ending' => 'Outro',
|
|
'coda' => 'Outro',
|
|
'intro' => 'Intro',
|
|
'interlude' => 'Zwischenspiel',
|
|
];
|
|
|
|
/** @return array<int, array{label: string, text: string}> */
|
|
public function parse(string $raw): array { /* split on HEADER, map via LABEL_MAP */ }
|
|
}
|
|
```
|
|
|
|
## Rate limiting & politeness
|
|
|
|
- Cap to **30 requests/minute** per app instance (`RateLimiter::for('ccli-scrape', fn () => Limit::perMinute(30))`).
|
|
- One concurrent scrape job (`ScrapeSongSelectJob` with `WithoutOverlapping` middleware).
|
|
- Cache result for 30 days (`songs.ccli_id` already keyed). User can force-refresh via "Re-import" button.
|
|
- Random jitter 500-1500ms between page loads.
|
|
|
|
## UI integration
|
|
|
|
1. **`Songs/Index.vue`** — top-bar search input "CCLI Lookup" → `POST /api/ccli/search { q }` → modal with results → "Import" button per row.
|
|
2. **`SongAgendaItem.vue`** (unmatched row) — new button "SongSelect suchen" next to existing Request/Assign → opens same modal pre-filled with CTS song name.
|
|
3. **Preview modal before save** — show parsed parts grouped by detected Label, allow drag-reassign / rename, then confirm import.
|
|
4. All German text, Du-form: "Suche bei CCLI…", "Importieren", "Als Strophe 1 zuweisen", etc.
|
|
|
|
## Failure modes & detection
|
|
|
|
| Symptom | Cause | Action |
|
|
|---|---|---|
|
|
| Redirect to `/account/signin` mid-session | Cookie expired | Re-run login flow, retry once |
|
|
| Empty `.song-result` list | DOM changed OR query 0 hits | Save HTML snapshot to `storage/logs/ccli/` for inspection |
|
|
| HTTP 429 / "Too many requests" page | Rate limit hit | Back off 5min, alert admin |
|
|
| Captcha (`recaptcha` iframe) | CCLI flagged automation | Stop, surface admin notice, fall back to manual paste |
|
|
| Login fails | Wrong creds OR account suspended | German error to admin |
|
|
|
|
Log every scrape into `api_request_logs` (existing table) with `service='songselect'` so the existing log UI shows them alongside CTS calls.
|
|
|
|
## Testing
|
|
|
|
- Unit-test the parser with fixtures in `tests/Fixtures/songselect/*.txt`.
|
|
- Mock the Playwright invocation in service tests via constructor closure (mirror `ChurchToolsService` pattern).
|
|
- E2E test against a sandbox public-domain song (e.g. CCLI #22025 "Amazing Grace") — gated by `CCLI_SONGSELECT_USER` env, skip if missing.
|
|
|
|
## Bootstrap checklist for a new agent
|
|
|
|
1. Confirm CCLI subscription credentials are in `.env`.
|
|
2. Add Chromium to DDEV web container.
|
|
3. Create `scripts/songselect-fetch.mjs`.
|
|
4. Create `app/Services/SongSelectScraperService.php` + `SongSelectLyricsParser.php` + `SongImportService::upsertFromSongSelect()` (refactor common parts out of `ProImportService`).
|
|
5. Create `ScrapeSongSelectJob` (queued, `WithoutOverlapping`).
|
|
6. Add routes `POST /api/ccli/search`, `POST /api/ccli/import/{ccliId}`.
|
|
7. Add Vue search modal + integrate into `Songs/Index.vue` + `SongAgendaItem.vue`.
|
|
8. Write parser unit tests + service feature test (mock Process).
|
|
9. Document the ToS gray area in README.
|
|
|
|
---
|
|
|
|
# Reference: How OpenLP imports from CCLI
|
|
|
|
Source: `openlp/plugins/songs/lib/songselect.py` on https://gitlab.com/openlp/openlp (LGPL).
|
|
|
|
**Approach: embedded Qt WebEngine (= real Chromium) + JS injection**
|
|
|
|
OpenLP does NOT do headless HTTP scraping. It opens a `QWebEngineView` (PySide6 Qt Chromium) inside the desktop app on `https://profile.ccli.com/account/signin?appContext=SongSelect&returnUrl=https%3a%2f%2fsongselect.ccli.com%2f`. The user signs in **manually** in that embedded browser (so they solve any captcha themselves). After login the same webview holds the authenticated cookies.
|
|
|
|
OpenLP then drives the page via `webview.page().runJavaScript(...)` to:
|
|
|
|
1. Detect current page by URL (`Login` / `Home` / `Search` / `Song` / `Other`).
|
|
2. Navigate by setting `document.location = "<url>"`.
|
|
3. Pre-fill login fields:
|
|
```js
|
|
document.getElementById("EmailAddress").value = "<email>";
|
|
document.getElementById("Password").value = "<password>";
|
|
```
|
|
(User still clicks Sign-In manually so Turnstile sees a real interaction.)
|
|
4. **Fetch any URL with the page's session cookies** by injecting:
|
|
```js
|
|
var openlp_page_data = null;
|
|
fetch("<url>")
|
|
.then(r => r.text())
|
|
.then(t => { openlp_page_data = t; });
|
|
```
|
|
then polls `openlp_page_data != null` and reads the result back into Python. This is the clever bit — they bypass cookie-export entirely, using the already-authenticated browser context as the HTTP client.
|
|
5. Parse HTML → song dict → write into the OpenLP DB via SQLAlchemy (`Song`, `Author`, `Topic`, `SongXML` verses with `VerseType.tags`).
|
|
|
|
URL constants in OpenLP:
|
|
```python
|
|
BASE_URL = 'https://songselect.ccli.com'
|
|
LOGIN_PAGE = 'https://profile.ccli.com/account/signin?appContext=SongSelect&returnUrl=https%3a%2f%2fsongselect.ccli.com%2f'
|
|
LOGIN_URL = 'https://profile.ccli.com'
|
|
LOGOUT_URL = BASE_URL + '/account/logout'
|
|
SEARCH_URL = BASE_URL + '/search/results'
|
|
SONG_PAGE = BASE_URL + '/Songs/'
|
|
CCLI_NUMBER_REGEX = r'.*?Songs\/([0-9]+).*'
|
|
```
|
|
|
|
**Lesson for a Laravel server-side port**: OpenLP succeeds because it ships a full GUI Chromium and pushes the captcha problem onto the user. A server-side scraper has to solve the same captcha non-interactively — see next section.
|
|
|
|
# Cloudflare Turnstile on CCLI login (verified 2026-05)
|
|
|
|
Confirmed by fetching `https://profile.ccli.com/account/signin?appContext=SongSelect`:
|
|
|
|
```html
|
|
<script src="https://challenges.cloudflare.com/turnstile/v0/api.js"></script>
|
|
<div class="cf-turnstile sr-only"
|
|
data-sitekey="0x4AAAAAAA1USwfe0YamenZA"
|
|
data-appearance="interaction-only"
|
|
data-callback="enableSubmit" inert></div>
|
|
```
|
|
|
|
- **Mode**: `interaction-only` (Managed/Invisible — silent unless trust score drops, then escalates to checkbox click)
|
|
- **Sitekey**: `0x4AAAAAAA1USwfe0YamenZA`
|
|
- **Submit button is disabled until Turnstile callback fires**, then a hidden `cf-turnstile-response` input is added to the POST body
|
|
- Form also includes ASP.NET `__RequestVerificationToken` (CSRF) — must be scraped from the GET response and sent back
|
|
- CCLI also injects **Cloudflare Bot Management JSD** (`/cdn-cgi/challenge-platform/scripts/jsd/main.js`) — additional passive fingerprinting on every page
|
|
|
|
## Can Turnstile be bypassed WITHOUT a real Chrome?
|
|
|
|
**Short answer: No.** Turnstile requires a JavaScript runtime + canvas + WebGL + AudioContext + matching TLS/JA3 fingerprint to mint a valid token. A real browser engine must run somewhere — locally, in a queue worker, or in the cloud.
|
|
|
|
The realistic option matrix:
|
|
|
|
| Approach | "Real Chrome" needed? | Cost | Reliability for CCLI | Notes |
|
|
|---|---|---|---|---|
|
|
| **Pure HTTP** (Guzzle / curl / requests) | none | free | **Will not work** | Cannot execute the Turnstile JS that mints the token. Hard wall. |
|
|
| **`curl-impersonate` / `curl_cffi`** (TLS-fingerprint spoofing) | none | free | **Will not work alone** | Solves JA3 fingerprint but still no JS engine for the Turnstile widget. Useful only AFTER a session cookie exists. |
|
|
| **Patched headless Chromium** (Playwright + `playwright-stealth`, `puppeteer-extra-plugin-stealth`, `nodriver`, `patchright`) | yes (local) | free | **Medium** for `interaction-only` mode | Stealth plugins hide `navigator.webdriver`, fix canvas/WebGL leaks. Often passes Turnstile silently. Breaks under residential-IP requirement or escalation to interactive. |
|
|
| **`undetected-chromedriver` + SeleniumBase UC Mode** | yes (local) | free | **Medium-High** | Has built-in `uc_gui_click_captcha()` that uses pyautogui to click the checkbox if Turnstile escalates. Python-only. |
|
|
| **Camoufox** (patched Firefox, fingerprint injection at C++ level) | yes (local) | free | **Medium-High** | Different signature from Chromium-based detection profiles; useful when stealth-Chromium gets flagged. |
|
|
| **CAPTCHA-solving service** (2Captcha, CapSolver, NextCaptcha, Anti-Captcha) | none locally; service runs browsers | ≈$1.45/1k tokens | **Low for CCLI specifically** | They return a Turnstile token bound to the sitekey + your IP. CCLI also fingerprints the browser env + JSD beacon, so token alone often fails to authenticate. Token TTL ≈ 5min, single-use. |
|
|
| **Cloud browser API** (Scrapfly ASP, Browserless, Bright Data Scraping Browser, Scrapeless, ZenRows, Oxylabs Web Unblocker) | yes (remote) | ≈$5-50/1k pages | **High** | Real Chromium + residential proxy + automatic challenge solving in one call. The only "no local Chrome" option that actually works at scale. |
|
|
| **Manual one-time login + persisted cookies** (OpenLP model) | yes (one-time, in user's own browser) | free | **High** | User logs in once via popup/embedded view, app stores `.AspNet.ApplicationCookie` + Cloudflare `cf_clearance` cookies, reuses them for HTTP scraping until they expire (typically 30 days; `cf_clearance` is shorter ≈ 1 hour but auto-refreshes if you keep the same browser fingerprint via `curl-impersonate`). |
|
|
|
|
**`cf_clearance` cookie pitfall**: even with a valid `.AspNet.ApplicationCookie`, Cloudflare checks `cf_clearance` on every request and ties it to the originating browser's TLS+UA fingerprint. Reusing the cookie from raw `curl` will give `403 / cf_chl_*` because the JA3 fingerprint won't match. Use `curl-impersonate-chrome` or `curl_cffi` (`curl_cffi.requests` with `impersonate="chrome120"`) so the TLS handshake matches the browser that minted the cookie.
|
|
|
|
## Recommended architecture for pp-planer
|
|
|
|
Hybrid that mirrors OpenLP's user-driven login but server-side scraping:
|
|
|
|
1. **Admin panel "CCLI Session" page**
|
|
- "Sign in to CCLI" button opens a popup window pointed at `https://profile.ccli.com/account/signin?appContext=SongSelect&returnUrl=https://pp-planer.ddev.site/api/ccli/oauth-callback`.
|
|
- User logs in normally. Their own browser handles Turnstile (silent in 99% of cases for residential IPs).
|
|
- On the redirect back to our callback, JS reads `document.cookie` from the popup (only works for cookies on **our** domain — see below) — so this approach actually requires a different mechanism.
|
|
|
|
2. **Better: bundled headless browser inside a queue worker**
|
|
- Use Playwright (already a dev dep) + `playwright-extra` + `playwright-extra-plugin-stealth` in headed mode for first login, headless for re-use.
|
|
- Persist `storageState` to `storage/app/ccli/state.json` (encrypted at rest).
|
|
- First-time setup: admin runs `php artisan ccli:login` → opens a non-headless Playwright browser on the server's display (or via VNC/X11 forwarding in DDEV) → admin types credentials and solves any escalated Turnstile checkbox.
|
|
- All subsequent fetches use saved cookies in headless mode. Re-prompt admin when cookies expire.
|
|
|
|
3. **For ongoing fetches**: once authenticated, can drop down to `curl_cffi`-style HTTP via Symfony HttpClient with a Chrome JA3 fingerprint (PHP package: `quic-go/curl-impersonate` shell-out, or call Node `curl-impersonate` script) — much faster than re-launching browser per request.
|
|
|
|
4. **Fallback if Turnstile escalates beyond stealth limits**: route through a cloud browser (Scrapfly ASP `asp=true` flag handles it). Make it pluggable behind `SongSelectClient` interface.
|
|
|
|
## Honest recommendation
|
|
|
|
For a church-internal tool used by a handful of staff, scraping at all is overkill. Realistic ranking:
|
|
|
|
1. **Manual paste flow** + lyric parser → 2 days of work, zero external deps, zero ToS risk.
|
|
2. **`.pro` import** (already done) — staff can download `.pro` files from SongSelect manually and drop them in the existing upload area.
|
|
3. **OpenLP-style embedded webview** — only works for desktop; doesn't fit a Laravel web app.
|
|
4. **Server-side stealth Playwright + persisted cookies** — works, but ~1-2 weeks of fragile glue code, breaks every CCLI redesign or Cloudflare ruleset bump.
|
|
5. **Cloud browser API (Scrapfly etc.)** — most reliable, costs €€, still ToS-gray.
|
|
|
|
If automation is mandatory: option 4 with option 5 as fallback when the local browser fails.
|