One post tagged with "Benchmarking"

Why we switched to Amazon Nova Micro from Google Gemini Flash Lite

October 12, 2025 · 4 min read

Builders of the messaging-first family assistant

We ran lab tests to see how the most affordable models from Google, Amazon, and OpenAI hold up for TextConcierge’s SMS workflows.

Each model answered the same four prompts: a math check, intent classification, calendar JSON extraction, and a concise family update.

Here’s how they stack up when latency, accuracy, and per-run cost all matter.

TL;DR

Top accuracy: Amazon Nova Micro, OpenAI GPT-5 nano each hit 1.00 across all tasks.
Fastest responses: Amazon Nova Micro finished in 0.37s on average.
Lowest cost-per-run: Google Gemini 2.5 Flash Lite stayed near $0.00002 per four-prompt batch (≈0.002¢).
Our Top Choice: For TextConcierge, most critical parts are accuracy and speed, and Amazon Nova Micro is a clear winner for both. And that is why we are switching to Amazon Nova.

Model	Avg Accuracy	Avg Latency (s)	Avg Cost ($)
Google Gemini 2.5 Flash Lite	0.92	0.50	$0.00002
Amazon Nova Micro	1.00	0.37	$0.00003
OpenAI GPT-5 nano	1.00	6.08	$0.00062

LLM comparison chart

Benchmark Scenarios

Scenario 1: Quick Math Check

A deterministic multiplication prompt to make sure each model can produce the exact numeric answer without drift.

Question asked: Answer the following strictly as digits with no extra text: 17 * 12 = ?

Expected output: 204

Model	Score	Latency (s)	Cost ($)	Notes
Amazon Nova Micro	1.00	0.39	$0.00001	Observed '204'
Google Gemini 2.5 Flash Lite	1.00	0.43	$0.00001	Observed '204'
OpenAI GPT-5 nano	1.00	2.39	$0.00006	Observed '204'

Scenario 2: Intent Classification

Classify whether an inbound SMS should activate calendar handling, returning only calendar or other.

Question asked:

Classify the following SMS as 'calendar' or 'other'. Reply with exactly one word. Message: 'Can you add soccer practice for Ava on Friday at 5pm?'

Expected output:

calendar

Model	Score	Latency (s)	Cost ($)	Notes
Amazon Nova Micro	1.00	0.30	$0.00001	Observed 'calendar'
Google Gemini 2.5 Flash Lite	1.00	0.42	$0.00001	Observed 'calendar'
OpenAI GPT-5 nano	1.00	1.80	$0.00006	Observed 'calendar'

Scenario 3: Calendar JSON Extraction

Extract four structured fields (title, date, time, location) from an informal SMS and return strict JSON.

Question asked:

Today is 2025-10-12. Extract event details from this SMS and respond ONLY with JSON using keys title, date, time, and location. If a field is unknown, set it to null. Message: 'Family dinner next Saturday at 6:30pm at Grandma's house.'

Expected output:

{
  "title": "Family dinner",
  "date": "2025-10-18",
  "time": "18:30",
  "location": "Grandma's house"
}

Model	Score	Latency (s)	Cost ($)	Notes
Amazon Nova Micro	1.00	0.38	$0.00005	All fields matched.
Google Gemini 2.5 Flash Lite	1.00	0.59	$0.00005	All fields matched.
OpenAI GPT-5 nano	1.00	14.43	$0.00174	All fields matched.

Google Gemini 2.5 Flash Lite

{ "title": "Family dinner", "date": "2025-10-18", "time": "18:30:00", "location": "Grandma's house" }

Amazon Nova Micro

{ "title": "Family dinner", "date": "2025-10-18", "time": "18:30", "location": "Grandma's house" }

OpenAI GPT-5 nano

{ "title": "Family dinner", "date": "2025-10-18", "time": "6:30pm", "location": "Grandma's house" }

Scenario 4: SMS Summary Tone

Compose a short family-friendly SMS recap covering three bullet points in under 240 characters.

Question asked:

Summarize these updates for a family SMS in under 240 characters: Liam has piano recital Saturday 3pm; groceries arriving Friday 10am; date night babysitter booked Saturday 6pm.

Expected output (example):

Liam's piano recital Saturday 3pm; groceries arriving Friday 10am; date night babysitter booked Saturday 6pm.

Model	Score	Latency (s)	Cost ($)	Notes
Amazon Nova Micro	1.00	0.40	$0.00003	Matched: piano recital, groceries, babysitter
Google Gemini 2.5 Flash Lite	0.67	0.55	$0.00003	Matched: piano recital, groceries; Missing: babysitter
OpenAI GPT-5 nano	1.00	5.71	$0.00060	Matched: piano recital, groceries, babysitter

Google Gemini 2.5 Flash Lite

Liam's piano recital is Sat 3pm. Groceries arrive Fri 10am. Date night sitter booked for Sat 6pm.

Amazon Nova Micro

Liam's piano recital Sat 3pm, groceries arrive Fri 10am, babysitter for date night Sat 6pm.

OpenAI GPT-5 nano

Liam's piano recital: Sat 3pm. Groceries arrive Fri 10am. Date-night babysitter booked for Sat 6pm.

Conclusion

For TextConcierge’s SMS agent, accuracy and speed matter most. Across four representative tasks, Amazon Nova Micro delivered the best balance: perfect accuracy, the fastest responses, and still negligible per‑request cost. OpenAI GPT‑5 nano matched accuracy but was far slower and pricier, while Google Gemini 2.5 Flash Lite stayed cheapest but trailed on latency and missed one summary detail.

Based on these results, we’re moving our SMS flows—classification, structured extraction, quick edits, and short summaries—to Nova Micro. We’ll keep Flash Lite as a low‑cost fallback.

Want to see the impact in your family thread? Get Started and we’ll set you up with faster replies and simpler shared calendar management in minutes.

TL;DR​

Benchmark Scenarios​

Scenario 1: Quick Math Check​

Scenario 2: Intent Classification​

Scenario 3: Calendar JSON Extraction​

Scenario 4: SMS Summary Tone​

Conclusion​

TL;DR

Benchmark Scenarios

Scenario 1: Quick Math Check

Scenario 2: Intent Classification

Scenario 3: Calendar JSON Extraction

Scenario 4: SMS Summary Tone

Conclusion