Getting Started¶

Requirements¶

Python 3.10+
Chromium available through Playwright
Network access to:
target sites you want to solve against
your configured OpenAI-compatible model endpoint

Installation¶

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
playwright install --with-deps chromium

Environment variables¶

Variable	Description	Default
`CLIENT_KEY`	Client auth key used as `clientKey`	unset
`CAPTCHA_BASE_URL`	OpenAI-compatible API base URL	`https://your-openai-compatible-endpoint/v1`
`CAPTCHA_API_KEY`	API key for your model provider	unset
`CAPTCHA_MODEL`	Strong text model	`gpt-5.4`
`CAPTCHA_MULTIMODAL_MODEL`	Multimodal model	`qwen3.5-2b`
`CAPTCHA_RETRIES`	Retry count	`3`
`CAPTCHA_TIMEOUT`	Model timeout in seconds	`30`
`BROWSER_HEADLESS`	Run Chromium headless	`true`
`BROWSER_TIMEOUT`	Browser timeout in seconds	`30`
`SERVER_HOST`	Bind host	`0.0.0.0`
`SERVER_PORT`	Bind port	`8000`

Start the service¶

export CLIENT_KEY="your-client-key"
export CAPTCHA_BASE_URL="https://your-openai-compatible-endpoint/v1"
export CAPTCHA_API_KEY="your-api-key"
export CAPTCHA_MODEL="gpt-5.4"
export CAPTCHA_MULTIMODAL_MODEL="qwen3.5-2b"
python main.py

Verify startup¶

Root endpoint¶

curl http://localhost:8000/

Health endpoint¶

curl http://localhost:8000/api/v1/health

The health response should include the registered task types and current runtime model settings.

Local and self-hosted model support¶

The image recognition path is built around OpenAI-compatible APIs. In practice, this means you can point CAPTCHA_BASE_URL at a hosted provider or a self-hosted/local multimodal gateway, as long as it exposes compatible chat-completions semantics and supports image input.

The project intentionally documents this in generic compatibility terms rather than claiming full validation for every provider stack.