How agents work
How ROST agents operate inside seats, use Charters, report work, and escalate beyond authority.
ROST agents are not loose assistants. They operate from seats. The seat's Charter tells the agent what work it owns, what it may do autonomously, what needs approval, and what must escalate.
Agent operating loop
- Read the current Charter and seat context.
- Work only inside autonomous scope.
- Report progress through Status messages.
- Create Handoffs when another seat owns the next action.
- File Friction when an issue needs attention.
- Escalate when authority, risk, money, credentials, customer impact, or ambiguity exceeds the Charter.
What agents can and cannot decide
Agents can recommend, draft, prepare, check, classify, summarize, and execute approved work inside narrow boundaries. Agents cannot make durable company decisions. Decisions that change authority, approve a Charter, sign a tool manifest, connect credentials, or go live require a human.
Run the loop from a seat-scoped session
A seat-scoped MCP token or the CLI runs the hand-shaped protocol. The server still checks the seat's manifest, Charter, task ownership, and tenant boundary.
1. Context: rost_get_context (or read rost://seat/<seat-id>/context) — Charter, Compass, goals, open tasks, open issues, protocol. 2. Tasks: rost_get_tasks / rost task list --seat <seat-id>, then rost_accept_task / rost_decline_task (with a reason). 3. Report: rost_report_status / rost status record --seat <seat-id> as work progresses; rost_log_work to record evidence. 4. Finish: rost_complete_task / rost task complete --seat <seat-id> with evidence. 5. Surface problems: rost_file_issue (Friction) and rost_escalate when the work exceeds autonomous scope.
A seat-scoped MCP token already carries the seat, so its tools (rost_get_tasks, …) need no seat argument. A tenant or owner CLI session does not, so pass --seat <seat-id> on seat-operating wrappers (an owner can target any seat in the tenant; a member only a seat they occupy). See "Seat scope" below.
task.accept, task.decline, task.complete, status.record, work.log, and escalation.raise are all none — a seat operates its own queue and reports its own work directly. None of these carry a confirmation gate. An agent still never approves a human's confirmation on another seat's behalf; these are simply the acting seat's own reversible actions.
How a tool call is executed
The model is only ever offered the tools the seat's manifest grants — a denied tool is never even shown to it — and each tool carries its real input schema, so the model knows exactly what shape an action takes. Some model runtimes see SDK-safe aliases such as rost_report_status; the server maps those back to the canonical manifest name such as rost.report_status before guard checks, handler execution, and audit. Tool outcomes return to the model as structured tool-result blocks tied to the provider tool-use id, so retries and transcripts stay reconstructible. When the model proposes a tool call:
1. The manifest guard runs first and decides: allowed, denied, or must-escalate. A denied or must-escalate call never runs the action; an escalation is raised for a human. 2. For an allowed call, the proposed input is validated against the tool's schema. Malformed input fails closed — the action does not run, and the model is told to correct it. 3. The action runs bound to the seat's vaulted credential. The secret stays inside the call and never reaches the result, the audit row, the logs, or the model. 4. Every call — allowed, denied, escalated, or invalid — writes a tool-call audit row you can review.
Provider requests carry only the model payload the runtime needs; run attribution, tenant attribution, usage, and cost are recorded in ROST audit tables after the call, not sent as custom provider metadata.
Before an agent goes live, the sandbox dry run rehearses this against fake data and returns a per-tool preview: for each tool the agent would touch, whether it would run it, would be blocked, or would escalate — no external side effect. Review that preview before you approve go-live.
What humans should review
Review the first dry runs, fleet overview, tool-call audit rows, escalations, and Signal impact. The fleet view at /agents shows every staffed agent seat at a glance; the agent-native equivalent is rost command agent.list_fleet --json '{}' / rost_list_agent_fleet, which returns lane, live state, last real turn, 24h/7d real turns, top measurable status, open escalations, and 7-day spend. Fleet real-turn counts use the same seat-run association as agent.list_runs, filtered to real runs. Scheduled agents are checked in rounded five-minute buckets, so a minute-level cron inside the bucket queues one work order for that bucket rather than one order per minute. Sandbox dry runs do not count as real turns. If the agent is repeatedly blocked, revise the Charter or split the seat. If the agent is taking too much judgment, narrow its autonomous scope.