I run a factory. Software, not steel, but it’s a factory. Twelve workflows. Each one has its own crew of agents, content going from nothing to published every day. Two Claude Max subscriptions feeding the whole thing.
I’ve always been good about inspecting the output. LangFuse traces on everything that ships. Evals on the content. If a recipe publishes wrong or a post goes out with hallucinated facts, I know about it. Months of this.
The output, though.
I never looked at the machines.
The timing mattered here. The walls have been closing in.
My consumption creeps toward the limits every week. OpenAI killed Sora because they needed the compute elsewhere. Anthropic’s been cutting limits during peak hours. Today, April 3rd, 2026, Anthropic told subscribers that starting tomorrow, third-party harnesses don’t run on the subscription. Pay as you go or walk.
The providers can see the math. There are not going to be enough tokens to go around.
I’m not careless with mine. I don’t run MCP servers I don’t need. I curate my skills. I manage my context window the way you manage a carry-on bag on a budget airline. Every byte earns its seat or it doesn’t fly.
But I had never once looked inside the sessions. The telemetry was pointed at the products. Never at the production line.
Claude Code has built-in OTEL telemetry. A community bridge routes those signals to LangFuse. I’d been putting off the setup because I’m on Windows and everything takes twice as long on Windows. Did it anyway. An afternoon. Got it wired.
Opened my first trace.
There’s a thing that happens when you open one of these for the first time. The spans fall down the left side of the screen. Green. Green. Green. Tool calls succeeding one after another. Software doing its job.
Then you see the yellow.
Two WARNING badges on Read operations. Sitting in all that green like two cigarette burns on a clean white tablecloth. Everything else looked fine.
I asked the agent about the warnings.
“Expected behavior,” it told me. “Not real failures. The Read tool hit its token limit on large files. I re-read in chunks. The workflow recovered correctly each time.”
Nothing to see here. No cockroaches. We don’t have any cockroaches.
I think the most important skill you can develop running agents is a bullshit detector. Mine went off.
“You read that whole file,” I said. “Ten thousand tokens. You were looking for one line. One field that says status: planning. You loaded the entire thing to find one line.”
The agent paused.
Then it got it.
The file was YAML. Six hundred fifty lines tracking every collection in the pipeline. The agent needed the next item marked planning and it loaded the whole thing. Cover to cover. Ten thousand tokens pulled into the context window like water into a sponge.
A Grep call would have returned the line number. Fifty tokens. A targeted Read around that match, maybe three hundred. Instead the agent dragged ten thousand tokens in and carried them through every turn that followed.
I want you to hear this part. Tokens loaded early in a conversation don’t cost you once. They sit in the context window. They ride along with every API call for the rest of the session. Ten thousand unnecessary tokens at turn two means ten thousand extra on every turn after that, all the way to twenty. It multiplies. Early bloat is the most expensive kind there is.
There was a second cockroach right next to the first. The duplicate check had been reading twelve YAML files front to back to pull one field from each. One field. Twelve files. A single Grep across the folder would have returned every value in one call.
The fix took fifteen minutes. Two edits to a slash command.
Grep before you read. Search, then look. Don’t read to search.
But the waste is just the cockroach.
What got me is that the workflow completed. Output was correct. No errors anywhere. I would have run that command dozens more times without knowing.
The agent said everything was fine. And it was fine — if you only look at the output. Product passed inspection every time.
The factory floor was covered in cockroaches.
That was one workflow. One out of twelve. One small corner of the operation. I looked at one thing and found waste in thirty seconds.
I haven’t looked at the other eleven yet.
The telemetry stays wired now. OTEL variables always active. When I want to observe, I start the bridge. When I’m done, I stop it. Every session traceable. Every tool call visible.
We all do the things we can see. We trim context windows and cut MCP tools we don’t need and write tighter prompts. Those things help.
But there’s a level underneath all of that. Bare metal. What the agents actually do when nobody’s watching. The files they read front to back when a search would do. The tokens they burn because nobody said stop and the output came out right and who is going to question the process when the product looks fine.
You can’t see that from the output side. You have to go down to the floor and turn on the lights.
I think someday an agent will do this job. Watch the traces, spot the patterns, do the math on the waste. But today the agent is the one who told me the cockroaches were expected behavior.
Today you need a human on the factory floor with a flashlight. Reading the gauges. Asking the questions the agents won’t ask about themselves.
Two hundred seventy-seven thousand wasted tokens. Thirty seconds of looking. One workflow out of twelve.
The lights are staying on.
Resources
- Setup gist — OTEL config, bridge wiring, start and stop commands
- Full repo — Automated setup script, agent skill that teaches your agents to query their own traces, and CLI patterns
- Claude Code: Monitoring usage — Official OTEL telemetry docs
- lainra/claude-code-telemetry — The community bridge