Skip to content

Getting Started

Requirements

  • Python 3.10+
  • Chromium available through Playwright
  • Network access to:
  • target sites you want to solve against
  • your configured OpenAI-compatible model endpoint

Installation

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
playwright install --with-deps chromium

Environment variables

Variable Description Default
CLIENT_KEY Client auth key used as clientKey unset
CAPTCHA_BASE_URL OpenAI-compatible API base URL https://your-openai-compatible-endpoint/v1
CAPTCHA_API_KEY API key for your model provider unset
CAPTCHA_MODEL Strong text model gpt-5.4
CAPTCHA_MULTIMODAL_MODEL Multimodal model qwen3.5-2b
CAPTCHA_RETRIES Retry count 3
CAPTCHA_TIMEOUT Model timeout in seconds 30
BROWSER_HEADLESS Run Chromium headless true
BROWSER_TIMEOUT Browser timeout in seconds 30
SERVER_HOST Bind host 0.0.0.0
SERVER_PORT Bind port 8000

Start the service

export CLIENT_KEY="your-client-key"
export CAPTCHA_BASE_URL="https://your-openai-compatible-endpoint/v1"
export CAPTCHA_API_KEY="your-api-key"
export CAPTCHA_MODEL="gpt-5.4"
export CAPTCHA_MULTIMODAL_MODEL="qwen3.5-2b"
python main.py

Verify startup

Root endpoint

curl http://localhost:8000/

Health endpoint

curl http://localhost:8000/api/v1/health

The health response should include the registered task types and current runtime model settings.

Local and self-hosted model support

The image recognition path is built around OpenAI-compatible APIs. In practice, this means you can point CAPTCHA_BASE_URL at a hosted provider or a self-hosted/local multimodal gateway, as long as it exposes compatible chat-completions semantics and supports image input.

The project intentionally documents this in generic compatibility terms rather than claiming full validation for every provider stack.