Getting Started¶
Requirements¶
- Python 3.10+
- Chromium available through Playwright
- Network access to:
- target sites you want to solve against
- your configured OpenAI-compatible model endpoint
Installation¶
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
playwright install --with-deps chromium
Environment variables¶
| Variable | Description | Default |
|---|---|---|
CLIENT_KEY |
Client auth key used as clientKey |
unset |
CAPTCHA_BASE_URL |
OpenAI-compatible API base URL | https://your-openai-compatible-endpoint/v1 |
CAPTCHA_API_KEY |
API key for your model provider | unset |
CAPTCHA_MODEL |
Strong text model | gpt-5.4 |
CAPTCHA_MULTIMODAL_MODEL |
Multimodal model | qwen3.5-2b |
CAPTCHA_RETRIES |
Retry count | 3 |
CAPTCHA_TIMEOUT |
Model timeout in seconds | 30 |
BROWSER_HEADLESS |
Run Chromium headless | true |
BROWSER_TIMEOUT |
Browser timeout in seconds | 30 |
SERVER_HOST |
Bind host | 0.0.0.0 |
SERVER_PORT |
Bind port | 8000 |
Start the service¶
export CLIENT_KEY="your-client-key"
export CAPTCHA_BASE_URL="https://your-openai-compatible-endpoint/v1"
export CAPTCHA_API_KEY="your-api-key"
export CAPTCHA_MODEL="gpt-5.4"
export CAPTCHA_MULTIMODAL_MODEL="qwen3.5-2b"
python main.py
Verify startup¶
Root endpoint¶
curl http://localhost:8000/
Health endpoint¶
curl http://localhost:8000/api/v1/health
The health response should include the registered task types and current runtime model settings.
Local and self-hosted model support¶
The image recognition path is built around OpenAI-compatible APIs. In practice, this means you can point CAPTCHA_BASE_URL at a hosted provider or a self-hosted/local multimodal gateway, as long as it exposes compatible chat-completions semantics and supports image input.
The project intentionally documents this in generic compatibility terms rather than claiming full validation for every provider stack.