Local Model Deployment¶
OhMyCaptcha supports running image recognition and classification tasks on a locally hosted model served via SGLang, vLLM, or any OpenAI-compatible inference server.
This guide covers deploying Qwen3.5-2B locally with SGLang.
Architecture: Local vs Cloud¶
OhMyCaptcha uses two model backends:
| Backend | Role | Env vars | Default |
|---|---|---|---|
| Local model | Image recognition & classification (high-throughput, self-hosted) | LOCAL_BASE_URL, LOCAL_API_KEY, LOCAL_MODEL |
http://localhost:30000/v1, EMPTY, Qwen/Qwen3.5-2B |
| Cloud model | Audio transcription & complex reasoning (powerful remote API) | CLOUD_BASE_URL, CLOUD_API_KEY, CLOUD_MODEL |
External endpoint, your key, gpt-5.4 |
┌────────────────────────────────────────────────────────────┐
│ OhMyCaptcha │
│ │
│ Browser tasks ──► Playwright (reCAPTCHA, Turnstile) │
│ │
│ Image tasks ───► Local Model (SGLang / vLLM) │
│ └─ Qwen3.5-2B on localhost:30000 │
│ │
│ Audio tasks ───► Cloud Model (remote API) │
│ └─ gpt-5.4 via external endpoint │
└────────────────────────────────────────────────────────────┘
Prerequisites¶
- Python 3.10+
- NVIDIA GPU with CUDA support (recommended: 8GB+ VRAM for Qwen3.5-2B)
pippackage manager
Step 1: Install SGLang¶
pip install "sglang[all]>=0.4.6.post1"
Step 2: Launch the model server¶
From Hugging Face¶
python -m sglang.launch_server \
--model-path Qwen/Qwen3.5-2B \
--host 0.0.0.0 \
--port 30000
From ModelScope (recommended in China)¶
export SGLANG_USE_MODELSCOPE=true
python -m sglang.launch_server \
--model-path Qwen/Qwen3.5-2B \
--host 0.0.0.0 \
--port 30000
With multiple GPUs¶
python -m sglang.launch_server \
--model-path Qwen/Qwen3.5-2B \
--host 0.0.0.0 \
--port 30000 \
--tensor-parallel-size 2
Once started, the server exposes an OpenAI-compatible API at http://localhost:30000/v1.
Step 3: Verify the model server¶
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3.5-2B",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 32
}'
You should receive a valid JSON response with model output.
Step 4: Configure OhMyCaptcha¶
Set the local model env vars to point at your SGLang server:
# Local model (self-hosted via SGLang)
export LOCAL_BASE_URL="http://localhost:30000/v1"
export LOCAL_API_KEY="EMPTY"
export LOCAL_MODEL="Qwen/Qwen3.5-2B"
# Cloud model (remote API for audio transcription etc.)
export CLOUD_BASE_URL="https://your-api-endpoint/v1"
export CLOUD_API_KEY="sk-your-key"
export CLOUD_MODEL="gpt-5.4"
# Other config
export CLIENT_KEY="your-client-key"
export BROWSER_HEADLESS=true
Step 5: Start OhMyCaptcha¶
python main.py
The health endpoint shows both model backends:
curl http://localhost:8000/api/v1/health
{
"status": "ok",
"supported_task_types": ["RecaptchaV3TaskProxyless", "..."],
"browser_headless": true,
"cloud_model": "gpt-5.4",
"local_model": "Qwen/Qwen3.5-2B"
}
Alternative: vLLM¶
vLLM can serve the same model with an identical API:
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3.5-2B \
--host 0.0.0.0 \
--port 30000
No changes to the OhMyCaptcha configuration are needed — both SGLang and vLLM expose /v1/chat/completions.
Backward compatibility¶
The legacy environment variables (CAPTCHA_BASE_URL, CAPTCHA_API_KEY, CAPTCHA_MODEL, CAPTCHA_MULTIMODAL_MODEL) are still supported as fallbacks. If you set CAPTCHA_BASE_URL without setting CLOUD_BASE_URL, the old value will be used. The new LOCAL_* and CLOUD_* variables take precedence when set.
Recommended models¶
| Model | Size | Use case | VRAM |
|---|---|---|---|
Qwen/Qwen3.5-2B |
2B | Image recognition & classification | ~5 GB |
Qwen/Qwen3.5-7B |
7B | Higher accuracy classification | ~15 GB |
Qwen/Qwen3.5-2B-FP8 |
2B (quantized) | Lower VRAM requirement | ~3 GB |