Eager Agents and the Sandbox That Isn't

I've been building tools and skills for agents lately. The hard part isn't the prose in a SKILL.md file. It's that when the model hits a snag, it almost never stops, raises a hand, and waits for you to steer. It plows ahead in whatever direction feels productive in the moment.

That's a problem if you're trying to design something with a real workflow behind it.

The CLI in your head

Picture a CLI with a clear sequence: step one, step two, step three. If step two fails, you don't want the agent to MacGyver a new step three out of duct tape and paperclips. You want it to say "step two failed, here's the error" and ask how you want to proceed.

Humans know that. Agents mostly don't. They treat "something went wrong" as a prompt to keep moving, not as a stop sign.

So skills end up being a lot of give and take. You write instructions, you watch the agent run, it goes sideways, you tighten the language or add guardrails, you run it again. It's testing-heavy in a way that feels less like shipping a library and more like training a very fast intern who doesn't always know when to ask.

No sandbox for the sandbox

Here's where my main frustration landed. Skills are new for basically everyone right now, which means a lot of us want to screen record a walkthrough or show a skill in context. The environment doesn't cooperate.

Open the wrong panel while you're recording and you've suddenly framed your email from the settings screen. Your chat history is right there in the shot half the time. There isn't a clean "presenter mode" that hides identity and history while still showing the work, and at this stage of the ecosystem we kind of need one.

I haven't felt the same block in Cursor. My history and recent files stay relatively tucked away, and if I open the project in a fresh VM the slate is cleaner anyway. Claude's agent workspace is a different beast: the UI puts more of your life on screen by default, which makes "just record a quick demo" feel like a privacy review.

I'm not sure every competing system has the same issue. I only know this one made me think twice before hitting record.

Eager fingers on the npm button

Loop back to the eagerness problem. When an agent gets stuck, I've watched it reach for heavyweight npm packages as if "it's on npm" were the same as "it's safe." The registry is full of useful stuff and also full of junk and worse. Humans should be picky. Agents should be pickier, not faster to trust.

Claude runs code in a sandboxed execution VM, which sounds reassuring until you learn how the filesystem is laid out. The sandbox is shared across agent conversations in a sessions-style folder. Agents may not be able to scribble over each other's work, but they can still see what's there. Combine that visibility with an agent that gets a bee in its bonnet and tries to install something sketchy because a skill or a tangent suggested it, and the threat model stops being purely theoretical.

I'm not predicting doom. I'm saying the optimism stack doesn't line up: eager behavior, broad read access in a shared space, and a package ecosystem that rewards shipping first. Something in that list has to give, and I'd rather it be the eagerness.

tl;dr: teach agents to pause and ask. Give us a real presenter mode for skill demos. And treat "install from npm" like crossing a street: look both ways, every time. The sandbox helps, but it isn't a substitute for judgment.