Can Qwen3.6-35B-A3B on an RTX 3060 Replace Google Vision for Receipt-to-JSON Extraction?

Jun 26, 2026

I built a LINE app once.

I liked the idea of sending commands to a messaging app instead of creating a web or mobile frontend.

The app accepted a few commands that triggered scripts on my server. If I sent it a photo of a receipt, it sent the image to Google Vision, extracted the data, parsed it into JSON, and stored it in SQLite.

It worked.

Then the prepaid credit I had loaded for Google Vision ran out.

Why not use my homelab instead of paying Google?

LINE receipt bot flow

I tried OCR and some specialized VLs, but nothing really useful.

That changed when I tried a 12GB-target GGUF quant of Qwen3.6-35B-A3B.

Not the full default Qwen3.6-35B-A3B weights.

An experimental quantization targeting a specific VRAM budget: fit the largest practical Qwen3.6-35B-A3B variant into a 12GB GPU.

That is exactly my machine setup.

Setup

Component	Setup
GPU	RTX 3060 12GB
RAM	64GB
Runtime	llama.cpp
Model file	Qwen3.6-35B-A3B-12Gb-2.6763bpw.gguf
Vision projector	mmproj-BF16.gguf
Context	8192
Thinking	disabled
Input	Paperless-ngx receipt image

Did it work?

Receipt picture taken from my Redmi Note 14 Pro 5G and uploaded using Paperless-ngx

Maruetsu receipt — all header fields extracted correctly

Field	Expected	Qwen output	Result
Store	maruetsu 目黒店	maruetsu 目黒店	✅
Date	2026-06-12	2026-06-12	✅
Subtotal	¥335	¥335	✅
Tax	¥26	¥26	✅
Total	¥361	¥361	✅
Items	4	4	✅

Yes, it worked.

For the fields I care about most in personal bookkeeping - date, store, subtotal, tax, and total - the results were good enough to make the project useful again.

In my own small set of about 30 receipts from different stores, those fields were consistently correct.

Item Breakdown

I also wanted item breakdowns, because my database has a receipts table and a related receipt_items table.

That is where things got more subtle.

Japanese receipts represent quantity in a few different ways, and that can create schema ambiguity.

My Basket receipt — quantity field ambiguity

Field	Expected	Qwen output	Result
Store	まいばすけっと西麻布店	まいばすけっと西麻布店	✅
Date	2026-06-21	2026-06-21	✅
Subtotal	¥896	¥896	✅
Tax	¥71	¥71	✅
Total	¥967	¥967	✅
Items	4	4	✅
Quantity case	`2個X 単279`	`package_quantity: 2`	⚠️

For this line:

オイコスPROリッチバ 558※ (2個X 単279)

Qwen extracted:

{
	"quantity": 1,
  "package_quantity": 2,
  "unit_price": 279,
  "total_price": 558
}

It captured the important quantity signal, but the field mapping still needed schema clarification.

How fast was it?

Metric	Result
GPU	RTX 3060 12GB
Average latency	31.75s/document
Peak VRAM	11.06 GiB

Maybe not fast enough for a chatbot.

But for a receipt process that runs overnight, 30 seconds per receipt is fine.

Future improvement: A receipt DB

Every time I process a receipt, I want to keep:

receipt image
model output JSON
final corrected JSON, if I had to fix it

That way, if I change the prompt, try a different model, update llama.cpp, or switch quantization, I can rerun the same receipts and see whether the pipeline actually got better.

Once I collect enough examples from the same stores, I could also use store-specific patterns.

For example, if a certain store always prints points, payment method, or item quantities in the same place, the system could learn to extract those fields more reliably.

TL;DR

A 12GB-target GGUF quant of Qwen3.6-35B-A3B was good enough to replace Google Vision in my personal receipt-to-JSON pipeline.
I tested it on around 30 receipts from supermarkets, drug stores, restaurants, and convenience stores in Japan.
The important bookkeeping fields - store, date, subtotal, tax, and total - were consistently correct in my tests.
My benchmark averaged about 31.75 seconds per receipt on an RTX 3060 12GB.
Item-level extraction mostly worked, but quantity fields still need careful schema design.
For my use case, that tradeoff is worth it: I can process receipts locally instead of paying Google every time.