Skip to main content

One post tagged with "Benchmarking"

Measured comparisons that show how TextConcierge stacks up against alternatives.

View All Tags

Why we switched to Amazon Nova Micro from Google Gemini Flash Lite

· 4 min read
TextConcierge Team
Builders of the messaging-first family assistant

We ran lab tests to see how the most affordable models from Google, Amazon, and OpenAI hold up for TextConcierge’s SMS workflows.

Each model answered the same four prompts: a math check, intent classification, calendar JSON extraction, and a concise family update.

Here’s how they stack up when latency, accuracy, and per-run cost all matter.

TL;DR

  • Top accuracy: Amazon Nova Micro, OpenAI GPT-5 nano each hit 1.00 across all tasks.
  • Fastest responses: Amazon Nova Micro finished in 0.37s on average.
  • Lowest cost-per-run: Google Gemini 2.5 Flash Lite stayed near $0.00002 per four-prompt batch (≈0.002¢).
  • Our Top Choice: For TextConcierge, most critical parts are accuracy and speed, and Amazon Nova Micro is a clear winner for both. And that is why we are switching to Amazon Nova.
ModelAvg AccuracyAvg Latency (s)Avg Cost ($)
Google Gemini 2.5 Flash Lite0.920.50$0.00002
Amazon Nova Micro1.000.37$0.00003
OpenAI GPT-5 nano1.006.08$0.00062

LLM comparison chart

Benchmark Scenarios

Scenario 1: Quick Math Check

A deterministic multiplication prompt to make sure each model can produce the exact numeric answer without drift.

Question asked: Answer the following strictly as digits with no extra text: 17 * 12 = ?

Expected output: 204

ModelScoreLatency (s)Cost ($)Notes
Amazon Nova Micro1.000.39$0.00001Observed '204'
Google Gemini 2.5 Flash Lite1.000.43$0.00001Observed '204'
OpenAI GPT-5 nano1.002.39$0.00006Observed '204'

Scenario 2: Intent Classification

Classify whether an inbound SMS should activate calendar handling, returning only calendar or other.

Question asked:

Classify the following SMS as 'calendar' or 'other'. Reply with exactly one word. Message: 'Can you add soccer practice for Ava on Friday at 5pm?'

Expected output:

calendar
ModelScoreLatency (s)Cost ($)Notes
Amazon Nova Micro1.000.30$0.00001Observed 'calendar'
Google Gemini 2.5 Flash Lite1.000.42$0.00001Observed 'calendar'
OpenAI GPT-5 nano1.001.80$0.00006Observed 'calendar'

Scenario 3: Calendar JSON Extraction

Extract four structured fields (title, date, time, location) from an informal SMS and return strict JSON.

Question asked:

Today is 2025-10-12. Extract event details from this SMS and respond ONLY with JSON using keys title, date, time, and location. If a field is unknown, set it to null. Message: 'Family dinner next Saturday at 6:30pm at Grandma's house.'

Expected output:

{
"title": "Family dinner",
"date": "2025-10-18",
"time": "18:30",
"location": "Grandma's house"
}
ModelScoreLatency (s)Cost ($)Notes
Amazon Nova Micro1.000.38$0.00005All fields matched.
Google Gemini 2.5 Flash Lite1.000.59$0.00005All fields matched.
OpenAI GPT-5 nano1.0014.43$0.00174All fields matched.

Google Gemini 2.5 Flash Lite

{ "title": "Family dinner", "date": "2025-10-18", "time": "18:30:00", "location": "Grandma's house" }

Amazon Nova Micro

{ "title": "Family dinner", "date": "2025-10-18", "time": "18:30", "location": "Grandma's house" }

OpenAI GPT-5 nano

{ "title": "Family dinner", "date": "2025-10-18", "time": "6:30pm", "location": "Grandma's house" }

Scenario 4: SMS Summary Tone

Compose a short family-friendly SMS recap covering three bullet points in under 240 characters.

Question asked:

Summarize these updates for a family SMS in under 240 characters: Liam has piano recital Saturday 3pm; groceries arriving Friday 10am; date night babysitter booked Saturday 6pm.

Expected output (example):

Liam's piano recital Saturday 3pm; groceries arriving Friday 10am; date night babysitter booked Saturday 6pm.
ModelScoreLatency (s)Cost ($)Notes
Amazon Nova Micro1.000.40$0.00003Matched: piano recital, groceries, babysitter
Google Gemini 2.5 Flash Lite0.670.55$0.00003Matched: piano recital, groceries; Missing: babysitter
OpenAI GPT-5 nano1.005.71$0.00060Matched: piano recital, groceries, babysitter

Google Gemini 2.5 Flash Lite

Liam's piano recital is Sat 3pm. Groceries arrive Fri 10am. Date night sitter booked for Sat 6pm.

Amazon Nova Micro

Liam's piano recital Sat 3pm, groceries arrive Fri 10am, babysitter for date night Sat 6pm.

OpenAI GPT-5 nano

Liam's piano recital: Sat 3pm. Groceries arrive Fri 10am. Date-night babysitter booked for Sat 6pm.

Conclusion

For TextConcierge’s SMS agent, accuracy and speed matter most. Across four representative tasks, Amazon Nova Micro delivered the best balance: perfect accuracy, the fastest responses, and still negligible per‑request cost. OpenAI GPT‑5 nano matched accuracy but was far slower and pricier, while Google Gemini 2.5 Flash Lite stayed cheapest but trailed on latency and missed one summary detail.

Based on these results, we’re moving our SMS flows—classification, structured extraction, quick edits, and short summaries—to Nova Micro. We’ll keep Flash Lite as a low‑cost fallback.

Want to see the impact in your family thread? Get Started and we’ll set you up with faster replies and simpler shared calendar management in minutes.