Can Qwen3.6-35B-A3B on an RTX 3060 Replace Google Vision for Receipt-to-JSON Extraction?
I built a LINE app once.
I liked the idea of sending commands to a messaging app instead of creating a web or mobile frontend.
The app accepted a few commands that triggered scripts on my server. If I sent it a photo of a receipt, it sent the image to Google Vision, extracted the data, parsed it into JSON, and stored it in SQLite.
It worked.
Then the prepaid credit I had loaded for Google Vision ran out.
Why not use my homelab instead of paying Google?

I tried OCR and some specialized VLs, but nothing really useful.
That changed when I tried a 12GB-target GGUF quant of Qwen3.6-35B-A3B.
Not the full default Qwen3.6-35B-A3B weights.
An experimental quantization targeting a specific VRAM budget: fit the largest practical Qwen3.6-35B-A3B variant into a 12GB GPU.
That is exactly my machine setup.
Setup
| Component | Setup |
|---|---|
| GPU | RTX 3060 12GB |
| RAM | 64GB |
| Runtime | llama.cpp |
| Model file | Qwen3.6-35B-A3B-12Gb-2.6763bpw.gguf |
| Vision projector | mmproj-BF16.gguf |
| Context | 8192 |
| Thinking | disabled |
| Input | Paperless-ngx receipt image |
Did it work?
Receipt picture taken from my Redmi Note 14 Pro 5G and uploaded using Paperless-ngx

| Field | Expected | Qwen output | Result |
|---|---|---|---|
| Store | maruetsu 目黒店 | maruetsu 目黒店 | ✅ |
| Date | 2026-06-12 | 2026-06-12 | ✅ |
| Subtotal | ¥335 | ¥335 | ✅ |
| Tax | ¥26 | ¥26 | ✅ |
| Total | ¥361 | ¥361 | ✅ |
| Items | 4 | 4 | ✅ |
Yes, it worked.
For the fields I care about most in personal bookkeeping - date, store, subtotal, tax, and total - the results were good enough to make the project useful again.
In my own small set of about 30 receipts from different stores, those fields were consistently correct.
Item Breakdown
I also wanted item breakdowns, because my database has a receipts table and a related receipt_items table.
That is where things got more subtle.
Japanese receipts represent quantity in a few different ways, and that can create schema ambiguity.

| Field | Expected | Qwen output | Result |
|---|---|---|---|
| Store | まいばすけっと 西麻布店 | まいばすけっと 西麻布店 | ✅ |
| Date | 2026-06-21 | 2026-06-21 | ✅ |
| Subtotal | ¥896 | ¥896 | ✅ |
| Tax | ¥71 | ¥71 | ✅ |
| Total | ¥967 | ¥967 | ✅ |
| Items | 4 | 4 | ✅ |
| Quantity case | 2個X 単279 |
package_quantity: 2 |
⚠️ |
For this line:
オイコスPROリッチバ 558※ (2個X 単279)
Qwen extracted:
{
"quantity": 1,
"package_quantity": 2,
"unit_price": 279,
"total_price": 558
}
It captured the important quantity signal, but the field mapping still needed schema clarification.
How fast was it?
| Metric | Result |
|---|---|
| GPU | RTX 3060 12GB |
| Average latency | 31.75s/document |
| Peak VRAM | 11.06 GiB |
Maybe not fast enough for a chatbot.
But for a receipt process that runs overnight, 30 seconds per receipt is fine.
Future improvement: A receipt DB
Every time I process a receipt, I want to keep:
receipt image
model output JSON
final corrected JSON, if I had to fix it
That way, if I change the prompt, try a different model, update llama.cpp, or switch quantization, I can rerun the same receipts and see whether the pipeline actually got better.
Once I collect enough examples from the same stores, I could also use store-specific patterns.
For example, if a certain store always prints points, payment method, or item quantities in the same place, the system could learn to extract those fields more reliably.
TL;DR
- A 12GB-target GGUF quant of Qwen3.6-35B-A3B was good enough to replace Google Vision in my personal receipt-to-JSON pipeline.
- I tested it on around 30 receipts from supermarkets, drug stores, restaurants, and convenience stores in Japan.
- The important bookkeeping fields - store, date, subtotal, tax, and total - were consistently correct in my tests.
- My benchmark averaged about 31.75 seconds per receipt on an RTX 3060 12GB.
- Item-level extraction mostly worked, but quantity fields still need careful schema design.
- For my use case, that tradeoff is worth it: I can process receipts locally instead of paying Google every time.