Image Classification Usage¶
Image classification tasks send one or more captcha images to an OpenAI-compatible vision model and return the indices of matching cells or a boolean answer. No browser automation is involved — these are pure vision model API calls.
Supported task types¶
| Task type | Description |
|---|---|
HCaptchaClassification |
hCaptcha 3x3 grid — returns matching cell indices |
ReCaptchaV2Classification |
reCAPTCHA v2 3x3 / 4x4 grid — returns matching cell indices |
FunCaptchaClassification |
FunCaptcha 2x3 grid — returns the correct cell index |
AwsClassification |
AWS CAPTCHA image selection |
Solution fields¶
| Task type | Solution field | Example |
|---|---|---|
HCaptchaClassification |
objects or answer |
[0, 2, 5] or true |
ReCaptchaV2Classification |
objects |
[0, 3, 6] |
FunCaptchaClassification |
objects |
[4] |
AwsClassification |
objects |
[1] |
HCaptchaClassification¶
Request shape¶
{
"clientKey": "your-client-key",
"task": {
"type": "HCaptchaClassification",
"queries": ["<base64-image-1>", "<base64-image-2>", "<base64-image-3>"],
"question": "Please click each image containing a bicycle"
}
}
The queries field accepts a list of base64-encoded images (one per grid cell). The question field is the challenge prompt displayed to the user.
Response¶
{
"errorId": 0,
"status": "ready",
"solution": {
"objects": [1, 4]
}
}
ReCaptchaV2Classification¶
Request shape¶
{
"clientKey": "your-client-key",
"task": {
"type": "ReCaptchaV2Classification",
"image": "<base64-encoded-grid-image>",
"question": "Select all images with traffic lights"
}
}
The image field is a single base64-encoded image of the full reCAPTCHA grid (3×3 = 9 cells or 4×4 = 16 cells). Cells are numbered 0–8 (or 0–15), left-to-right, top-to-bottom.
Response¶
{
"errorId": 0,
"status": "ready",
"solution": {
"objects": [0, 3, 6]
}
}
FunCaptchaClassification¶
Request shape¶
{
"clientKey": "your-client-key",
"task": {
"type": "FunCaptchaClassification",
"image": "<base64-encoded-grid-image>",
"question": "Pick the image that shows a boat facing left"
}
}
The grid is typically 2×3 (6 cells). Usually one answer is expected.
Response¶
{
"errorId": 0,
"status": "ready",
"solution": {
"objects": [3]
}
}
AwsClassification¶
Request shape¶
{
"clientKey": "your-client-key",
"task": {
"type": "AwsClassification",
"image": "<base64-encoded-image>",
"question": "Select the image that matches"
}
}
Response¶
{
"errorId": 0,
"status": "ready",
"solution": {
"objects": [1]
}
}
Create and poll (generic example)¶
# Step 1: create task
TASK_ID=$(curl -s -X POST http://localhost:8000/createTask \
-H "Content-Type: application/json" \
-d '{
"clientKey": "your-client-key",
"task": {
"type": "ReCaptchaV2Classification",
"image": "'$(base64 -w0 captcha.png)'",
"question": "Select all images with traffic lights"
}
}' | python -c "import sys,json; print(json.load(sys.stdin)['taskId'])")
# Step 2: poll result
curl -s -X POST http://localhost:8000/getTaskResult \
-H "Content-Type: application/json" \
-d "{\"clientKey\":\"your-client-key\",\"taskId\":\"$TASK_ID\"}"
Operational notes¶
- All classification tasks are synchronous from the model's perspective — the
asyncio.create_taskwrapper means the HTTP response is immediate, but the actual model call happens in the background. - Model accuracy depends entirely on the vision model configured via
CAPTCHA_MULTIMODAL_MODEL(default:qwen3.5-2b). - For best results with classification, the
CAPTCHA_MODEL(gpt-5.4) can be substituted by settingCAPTCHA_MULTIMODAL_MODEL=gpt-5.4. - Images should not be pre-resized — the solver handles normalization internally.