It is 11pm. The agency owner has 312 unread emails, a client asking why their campaign is paused, a supplier chasing a PO, and three cold pitches pretending to be replies. She has been told by four different vendors this quarter that an AI agent will handle her inbox. She tried one. It replied to a journalist with a pricing quote meant for a lead. She turned it off the next morning.
This post is the playbook we give every client before we connect a chat or email agent to a real inbox. Three rules. If your agent does not satisfy all three, keep it in draft-mode and do not let it send.
Rule one: a scope the agent can prove it is inside
Most inbox-agent failures are scope failures. The agent is asked to "handle support email" and nobody has defined what that means. So it answers a legal question from opposing counsel. It quotes a refund policy that was updated last quarter. It agrees to a meeting on a date the founder is on a plane.
The fix is boring and it works: the agent must classify the message into one of a short, closed list of intents before it is allowed to draft a reply. Not "is this a support email?" — that is a tautology. The intents are the concrete things you are willing to have answered without a human in the loop.
For a typical SMB inbox we start with five: order_status, invoice_question, meeting_request, general_info, out_of_scope. Everything that does not fit one of the first four lands in out_of_scope and gets forwarded to a human. The agent never guesses. If confidence is below a threshold, the intent is out_of_scope by default.
INTENTS = {
"order_status",
"invoice_question",
"meeting_request",
"general_info",
"out_of_scope",
}
def classify(message: str, llm) -> tuple[str, float]:
result = llm.classify(
message,
labels=sorted(INTENTS),
system="Pick exactly one label. If unsure, pick out_of_scope.",
)
intent = result.label if result.label in INTENTS else "out_of_scope"
if result.confidence < 0.75:
intent = "out_of_scope"
return intent, result.confidence
Two things matter here. First, the label set is closed — the agent cannot invent a sixth intent at 3am. Second, low confidence is not a drafting problem, it is a routing problem. You do not want a cautiously-worded reply to a message you did not understand. You want a human to see it.
A chat agent without a closed intent list is not triaging your inbox, it is gambling with your replies.
Why not one big prompt
You will be tempted to skip the classifier and just give the model a long system prompt that says "only answer questions about X, Y, Z, otherwise escalate." We have tried it. It drifts. The model helpfully answers a question that was 80% inside scope and 20% outside, and the 20% is the part that gets you sued. A separate classifier step is a decision boundary you can log, audit, and tune. A single prompt is vibes.
Rule two: an escalation path that is the default, not the fallback
The second rule reverses how most teams think about human-in-the-loop. The agent does not reply and then escalate if something goes wrong. The agent drafts, and a human approves, until the agent has earned the right to send on its own — per intent, not in aggregate.
We run every new agent in three phases:
- Shadow. The agent drafts into a Slack channel. No reply is sent. A human reads the draft alongside the real message and either edits-and-sends from their normal mail client, or ignores the draft entirely. This runs for at least 100 messages per intent.
- Suggest. The agent drafts directly into the mail client as an unsent draft. A human opens it, edits, sends. We track edit distance. When median edit distance for an intent drops below a threshold (we use 15% of characters), that intent graduates.
- Send. For graduated intents only, the agent sends. Every send still posts to a review channel with a one-click "this was wrong" button that pages the on-call human and demotes the intent back to Suggest.
Two intents can be in different phases at the same time. order_status might be in Send because it is a templated lookup against your order table. meeting_request might still be in Suggest because calendars are hard and nobody has been fired over a late reply but plenty have been fired over a double-booked client lunch.
Do not graduate an intent based on how the agent feels. Graduate it based on measured edit distance on real messages over a real window. Vibes ship bugs to clients.
Rule three: a write-lock on everything that touches the outside world
The third rule is the one teams skip because it feels like paranoia until the day it is not. An agent that can read your inbox is a research tool. An agent that can write — send email, call an API, update a CRM, move money — is a liability. Treat those two capabilities as separate systems with separate credentials.
In practice this means the agent never holds the sending credential. A narrow service does. The agent calls that service with a structured request, and the service enforces the rules the agent cannot be trusted to enforce on itself: rate limits per recipient, domain allowlists, a hard cap on sends per hour, a block on anything that looks like a new external domain in the first 24 hours of a conversation.
def send_reply(draft: Draft, agent_token: str) -> SendResult:
# The agent has agent_token. It does NOT have SMTP creds.
if not allowlist.contains(draft.to_domain):
return SendResult.rejected("domain_not_allowlisted")
if rate_limiter.exceeded(draft.to_address):
return SendResult.rejected("rate_limited")
if draft.intent not in graduated_intents():
return SendResult.rejected("intent_not_graduated")
if contains_payment_instruction(draft.body):
return SendResult.rejected("payment_language_blocked")
return mail_gateway.send(draft) # gateway holds the real creds
The pattern is simple: a credential proxy between the agent and the thing that can cause damage. The agent asks, the proxy decides. The agent never holds the key. Whether you write this as 80 lines of Python or adopt one of the emerging agent-gateway tools, the principle does not change.
The payment-language check in that snippet is not theoretical. Any agent replying to invoice questions will eventually be asked to confirm new bank details, and "confirm" is exactly the word you do not want it to say. Block the vocabulary at the gateway. Let a human handle it.
What these three rules buy you
Put the three together — closed-set classifier, per-intent graduation, write-proxy with hard rules — and you end up with an agent that is useful on day one and boring by month three. Boring is the goal. A chat agent that replies to 60% of your inbox in a way nobody notices is worth ten agents that reply to 100% in a way somebody screenshots.
The OWASP working group on LLM security has been converging on similar ground in their Top 10 for LLM Applications. Prompt injection, excessive agency, and insecure output handling are all variations of the same failure mode: the agent was allowed to act before it was constrained. NIST's AI 600-1 profile makes the same argument in more formal language — confinement before capability. The three rules above are one practical shape of those constraints for an inbox.
A five-minute audit you can run today
If you have a chat or email agent live right now, open three tabs.
Tab one: the agent's system prompt. Is there a closed list of intents it is allowed to handle, or is there a paragraph that starts "You are a helpful assistant that"? If the second, you have no scope.
Tab two: the last 50 messages the agent sent. For each one, ask whether a human read the draft before it went out. If the answer is "the agent has been running itself for weeks," you skipped the graduation phase.
Tab three: the code path that actually calls your mail provider. Is the sending credential in the same process as the LLM call? If yes, you have no write-lock. Any prompt injection that reaches the agent reaches your outbox.
When we built the triage agent for a Rotterdam logistics client, the thing we ran into was exactly rule three: the first version held its own SMTP credential and a single malformed forwarded email convinced it to reply to an entire mailing list. We rebuilt it behind a gateway the same week, and that gateway is now the template for every AI agent we ship. Three rules. In that order.




