← Back

Can Qwen3.6-35B-A3B on an RTX 3060 Replace Google Vision for Receipt-to-JSON Extraction?

I built a LINE app once.

I liked the idea of sending commands to a messaging app instead of creating a web or mobile frontend.

The app accepted a few commands that triggered scripts on my server. If I sent it a photo of a receipt, it sent the image to Google Vision, extracted the data, parsed it into JSON, and stored it in SQLite.

It worked.

Then the prepaid credit I had loaded for Google Vision ran out.

Why not use my homelab instead of paying Google?

LINE receipt bot flow

I tried OCR and some specialized VLs, but nothing really useful.

That changed when I tried a 12GB-target GGUF quant of Qwen3.6-35B-A3B.

Not the full default Qwen3.6-35B-A3B weights.

An experimental quantization targeting a specific VRAM budget: fit the largest practical Qwen3.6-35B-A3B variant into a 12GB GPU.

That is exactly my machine setup.

Setup

Component Setup
GPU RTX 3060 12GB
RAM 64GB
Runtime llama.cpp
Model file Qwen3.6-35B-A3B-12Gb-2.6763bpw.gguf
Vision projector mmproj-BF16.gguf
Context 8192
Thinking disabled
Input Paperless-ngx receipt image

Did it work?

Receipt picture taken from my Redmi Note 14 Pro 5G and uploaded using Paperless-ngx

Maruetsu receipt — all header fields extracted correctly

Field Expected Qwen output Result
Store maruetsu 目黒店 maruetsu 目黒店
Date 2026-06-12 2026-06-12
Subtotal ¥335 ¥335
Tax ¥26 ¥26
Total ¥361 ¥361
Items 4 4

Yes, it worked.

For the fields I care about most in personal bookkeeping - date, store, subtotal, tax, and total - the results were good enough to make the project useful again.

In my own small set of about 30 receipts from different stores, those fields were consistently correct.

Item Breakdown

I also wanted item breakdowns, because my database has a receipts table and a related receipt_items table.

That is where things got more subtle.

Japanese receipts represent quantity in a few different ways, and that can create schema ambiguity.

My Basket receipt — quantity field ambiguity

Field Expected Qwen output Result
Store まいばすけっと 西麻布店 まいばすけっと 西麻布店
Date 2026-06-21 2026-06-21
Subtotal ¥896 ¥896
Tax ¥71 ¥71
Total ¥967 ¥967
Items 4 4
Quantity case 2個X 単279 package_quantity: 2 ⚠️

For this line:

オイコスPROリッチバ 558※ (2X279)

Qwen extracted:

{
	"quantity": 1,
  "package_quantity": 2,
  "unit_price": 279,
  "total_price": 558
}

It captured the important quantity signal, but the field mapping still needed schema clarification.

How fast was it?

Metric Result
GPU RTX 3060 12GB
Average latency 31.75s/document
Peak VRAM 11.06 GiB

Maybe not fast enough for a chatbot.

But for a receipt process that runs overnight, 30 seconds per receipt is fine.

Future improvement: A receipt DB

Every time I process a receipt, I want to keep:

receipt image
model output JSON
final corrected JSON, if I had to fix it

That way, if I change the prompt, try a different model, update llama.cpp, or switch quantization, I can rerun the same receipts and see whether the pipeline actually got better.

Once I collect enough examples from the same stores, I could also use store-specific patterns.

For example, if a certain store always prints points, payment method, or item quantities in the same place, the system could learn to extract those fields more reliably.

TL;DR

  • A 12GB-target GGUF quant of Qwen3.6-35B-A3B was good enough to replace Google Vision in my personal receipt-to-JSON pipeline.
  • I tested it on around 30 receipts from supermarkets, drug stores, restaurants, and convenience stores in Japan.
  • The important bookkeeping fields - store, date, subtotal, tax, and total - were consistently correct in my tests.
  • My benchmark averaged about 31.75 seconds per receipt on an RTX 3060 12GB.
  • Item-level extraction mostly worked, but quantity fields still need careful schema design.
  • For my use case, that tradeoff is worth it: I can process receipts locally instead of paying Google every time.