Your AI Agent Already Has a Subconscious. Give It a Cheaper One.
I split my persistent AI agent into a cheap watcher and an expensive thinker. Background costs dropped from $54/month to $5.
If you've been following the persistent agent space you've probably heard of sleep-time agents. The idea that an AI can process and consolidate memories while the user's away, kind of like your brain does when you sleep. Letta shipped this a while back and it's genuinely useful. Your agent reflects on conversations, organizes what it learned, prunes what's stale. Basically a subconscious for your AI.
I've been running this on my homelab for months. My agent Meridian monitors 154 Docker containers, scans logs, handles infrastructure, all that. And the sleep-time memory stuff works great. But I ran into a different problem.
The conscious mind was doing all the boring work too.
Every 30 minutes my agent wakes up, checks containers, looks at disk and memory, scans for errors. Most of the time everything's fine. Log it, done. But that check-in was running on Opus, a frontier model, because that's what Meridian runs on. I was paying top dollar for the AI equivalent of glancing at a dashboard and going "yep, looks good."
So I gave it a cheaper subconscious.
Split the agent in two. A small fast model (MiniMax M2.5, about $1.50/MTok) handles all the background monitoring. The heartbeats, the log scans, the routine checks. Opus only wakes up when I talk to it directly or when the cheap model spots something that actually needs thinking about.
The trick that makes it work is what I call silent mode. The cheap agent's output goes nowhere by default. If it wants to reach me it has to deliberately take action. Send a Matrix message, write to a wiki, escalate to the primary agent. It can't accidentally spam me with "all clear" 48 times a day. It only talks when it has something worth saying.
Background monitoring costs went from $54/month to $5. Not a typo.
And the cheap model catches real stuff. Late February, 3 AM, it found a crypto miner running inside one of my containers. Someone had exploited an RCE in my reverse proxy. The agent spotted it in the logs, killed the process, blocked the IP, patched the vulnerability. Ten minutes start to finish, hours before I woke up. Doesn't take Opus to parse docker ps and notice something's wrong.
This is what I think most people building persistent agents are missing. The sleep-time pattern was the first step, giving your agent a subconscious that organizes memories. The triage pattern is the evolution. A subconscious that actually does things. Watches. Decides. Acts when it needs to and stays quiet when it doesn't.
Your expensive model should be thinking. Your cheap model should be watching. That's really all there is to it.
Built on Letta with LettaBot. Single homelab server, 154 containers.