Think mode & the lineup: which Grok for which job.
xAI ships models faster than anyone explains them, so most users either always use the default (leaving quality on the table) or always max out (paying in time and limits). The fix is the same one we teach on every track: a simple match between task type and model — plus knowing when Think mode is worth its wait.
01 The lineup, decoded
| Model | Personality | Reach for it when |
|---|---|---|
| Grok 4 | The flagship daily driver | Default for everything: drafting, X workflows, general questions, Imagine prompting. |
| Grok 4.20 Reasoning | The careful one — thinks first, lowest hallucination rate, strongest tool calling | Analysis, multi-step problems, anything where being wrong costs more than being slow. |
| Grok 4.3 | The newest and sharpest — consumer access gated to Heavy ($300); cheaper via API | If you're on Heavy or building on the API. Most users: don't chase it. |
| Free-tier model | An older, limited workhorse | Trying Grok out. Fine for the Lesson 1 exercises, not for real work. |
02 The sorting rule that survives every release
"If this answer is subtly wrong, what does it cost me?" Low cost (a draft you'll edit, a brainstorm, a meme) → fast default model. High cost (numbers you'll act on, an analysis you'll forward, a decision) → reasoning model, Think mode on. That's the whole framework — the same one from our which-Claude lesson, because it's true for every AI family.
03 Think mode: when the wait pays
What it actually does
Think mode makes Grok work through a chain of reasoning before answering — visibly slower, measurably more reliable on hard problems. It shines on: math and logic, multi-constraint planning ("schedule these 6 things with these 4 rules"), debugging an argument or a spreadsheet formula, and reading dense material for what it implies rather than says.
Where it's wasted
Simple lookups, casual drafting, X pulse checks, image prompts. Reasoning overhead on easy questions doesn't add accuracy — it adds wait. (Sound familiar? Same lesson as "don't use a Mythos-class model to summarize an email.")
The test that proves it to you
The default model gives you a fluent answer. Think mode usually gives you the one that noticed the conflicting constraint. That difference is what you're paying the wait for.
Big Brain, on SuperGrok and up, is the heavier-compute variant of the same idea — more thinking budget for genuinely hard problems. Use it like Think mode's overdrive: rarely, deliberately, on the questions where being right matters most.
04 Daily-driver doctrine
- Default to Grok 4 for the flow of the day.
- Escalate to Reasoning/Think when the wrong-answer cost is real — and say "think step by step through the constraints" to point the effort.
- Don't buy Heavy for 4.3 unless your work measurably hits the ceiling of the standard models. (It almost certainly doesn't yet — and the API is the cheaper test if you're curious.)
- Re-run important answers once on the reasoning model as a check. Two agreeing models ≠ proof, but disagreement is the cheapest red flag in AI.
Calibrate yourself
Take one real decision on your plate this week and run the two-way test above. Notice not just which answer is better, but how it's better — that's you learning to feel where the reasoning premium pays. From now on, model choice is a reflex, not a guess.
What you can do now
- Decode the lineup: Grok 4 (default), 4.20 Reasoning (careful), 4.3 (Heavy/API), free tier (trial)
- Sort tasks with the one question: what does a subtly wrong answer cost?
- Use Think mode where reasoning pays — and skip it where it's just wait
- Reserve Big Brain for the rare genuinely hard problem
- Use model disagreement as your cheapest error detector