Already cheap, now make it almost free.
DeepSeek is inexpensive out of the box. At volume, three techniques take the bill down by most of what remains. Here is the playbook.
01Caching — the biggest lever
If your requests share a stable prefix — a long system prompt, a reference document, a few-shot example set — context caching bills those repeated tokens at a fraction of the price. Structure prompts so the stable part comes first and changes least, and you cache the expensive bulk.
02Routing — Flash by default, Pro on demand
Send everything to Flash and escalate to Pro only when needed — either by task type or by a confidence check (if Flash flags uncertainty, retry on Pro). Most production traffic is Flash-appropriate; paying Pro prices for all of it is the most common waste.
03Batching — fewer, bigger calls
Group independent items into batched requests where possible, and reuse cached prefixes across them. Fewer round trips, more cache hits, lower overhead. For non-urgent jobs, batch overnight.
Cache the stable context, route the easy 90% to Flash, batch the independent work, and reserve Pro for the genuinely hard calls. Together these routinely cut a real workload's spend by the large majority.