One smart model, many small ones: how I slashed my Fable usage
A delegation tip from the Claude Code team changed my workflow.
Anthropic's Fable 5 model is available again, and plenty of users are upset about the tighter limits that came with it. Here's why AI coding is evolving yet again.
Reduced quota
Launched on 9 June, Fable 5 is a new flagship model that Anthropic says is especially suited to software engineering, long-horizon reasoning, and other complex tasks.
I tried it when it first came out, and it was phenomenal, though it did sponge up 2.5 million tokens in 30 minutes flat.
Then, a few days after its release, the model was abruptly pulled by a US government export-control directive. On 1 July, it was reinstated, albeit with additional restrictions and limitations.
Until 7 July, less than a week away, Fable 5 counts for up to 50% of a Claude plan's weekly limit. After that, it's API credits only. That is far less than the original two weeks at 100%, and many users are understandably not happy.

Mixing AI models
The greatly reduced usage limits gave me pause. Rather than jumping straight into a software refactor I'd planned, I decided to hold off until I had time to think it through. I'm glad I did.
This morning, I came across an Ethan Mollick LinkedIn post about using frontier models for delegation. He cited a blog post by noted software engineer Simon Willison, who shared a tip he'd learned from the Claude Code team.
In a nutshell, there are two parts. The first is to let the smarter model, such as Fable, use its own judgement on what to do. The second is to tell Fable to hand smaller tasks to other models, again using its judgement on which one to pick.
That second part is working incredibly well for me. I've managed to do some really complex hardening work, more than 1.5 hours of continuous work across multiple subagents so far, and the needle has barely moved on my Fable usage. Fable designs, directs, and verifies. Sonnet codes the detailed specifications. Haiku handles the simple code edits.
Next stage of AI
I think the days of tokenmaxxing are effectively over.
As AI models grow even more powerful and the amount of useful work they can do increases, cost will rise correspondingly. That means more judicious use to keep costs down.
And it makes sense, too. Getting it done right the first time beats trying to solve problems by sheer number of attempts. Mistakes can also cause code sprawl that leads to further mistakes anyway.
Here's the simple instruction I adopted: "For all coding tasks use your judgement to decide an appropriate lower power model and run that in a subagent." The question now is whether the rest of us adjust our habits fast enough.