A new flagship model lands every six weeks. Your team uses one provider for some jobs and another for others, and the choice is folklore. Your vendor account-manager tells you to use “the smartest model” — that maximises their bill, not your output. There is no matrix on your wall telling you which model fits which marketing job, and the matrix you sketched in January was already stale by March.
Why “the smartest model” is the wrong heuristic
The flagship model wins the benchmark the vendor’s marketing department picks. It maximises three things at once — cost per call, latency, and lock-in to the vendor’s release cadence. None of those is the constraint your team is hitting. Your constraint is the next brief that has to ship, and the model that ships it best is the one that fits the job shape, not the one that tops the leaderboard.
Choose by job shape
The four marketing jobs the Playbook calls out (see the four jobs post) carry different capability profiles:
- Brief-to-draft on long-form content. Turning a brief into a 1,200-word draft with structural fidelity, voice control, and factual grounding. The capability profile is long-form coherence under instruction. The call is infrequent and the editorial gate is downstream, so cost per call is secondary to draft quality.
- Variant production for paid and lifecycle. Fifty variants that all hit a 90-character cap and a brand-voice spec. The capability profile is steerability under tight constraints with high throughput. Cost per call matters because the call runs fifty times in a row, and latency matters because the queue stalls without it.
- Distribution copy compression. Long-form into a thread; a thread into a LinkedIn post. The capability profile is voice preservation under length compression. Latency is secondary; voice fidelity is the binding constraint.
- Audience research synthesis. Twenty customer interviews into a positioning map; a hundred support tickets into a pain-point cluster. The capability profile is long context window with coherent multi-document reasoning, plus structured output that survives pasting into a slide — JSON, tables, schema-bound responses.
A model that wins long-form coherence often loses on high-throughput variant generation, and the reverse. The team that runs everything on one model is paying the worst-case price on three of four jobs.
Why this post does not name a model
A table that says “use Model X for synthesis, Model Y for variants” rots in a quarter. The Playbook’s evidence standards reject any benchmark cite older than thirty-six months, and most model-vs-model benchmarks rot faster than that. What does not rot is the job-shape spec — the description of the job, the inputs, the outputs, the constraints. Write that for each marketing job your team runs; re-evaluate models against the spec every six months or at every major release, whichever comes first.
What you walk away with
A team that does not chase the leaderboard and does not pay flagship prices on every job. The next release that lands gets evaluated against your specs, not against the vendor’s pitch deck. The matrix on your wall is one you wrote and one your finance team can defend.
If you want the long-form treatment of the four jobs and the capability profiles that match each, the sample chapter is free in your browser, and the full Playbook is the rest of the work.