Compliance test controller

The compliance test controller is a dev/staging-only affordance, not a production-time concept. AAO grading does NOT require or use it. The AAO compliance heartbeat drives storyboards against the seller’s registered production URL with account.sandbox: true on every request, and the seller’s prod stack is responsible for honoring the flag — no controller endpoint needed.Sellers MAY implement the controller in their dev or staging environment to support their own integration testing — walking lifecycle state machines deterministically, seeding fixtures, forcing transitions that would otherwise require waiting for real time. That’s its purpose. It MUST NOT be exposed on production deployments (see Sandbox gating below).Confused about how the controller relates to AAO Verified (Sandbox)? See #4379 for the framing decision: (Sandbox) attests “real production endpoint correctly handles sandbox-flagged traffic across the full storyboard suite.” The controller is the developer-side affordance for your testing, not the AAO-side grading mechanism.

AdCP defines lifecycle state machines for accounts, creatives, media buys, SI sessions, and delivery reporting. Many transitions in these state machines are seller-initiated — creative approval, account suspension, budget depletion, delivery accrual. A storyboard runner can only exercise buyer-initiated flows, leaving seller-initiated transitions untested. The compliance test controller is an optional tool sellers expose in their dev/staging environment to support deterministic local testing. It allows a runner to trigger seller-side state transitions on demand, enabling end-to-end lifecycle verification during development.

Motivation

Without a test controller, compliance testing is observational: fire an action, read back whatever state exists, move on. This catches schema violations but not behavioral ones.

Track	Observational (today)	Deterministic (with controller)
Creative	Sync → observe initial status	Walk `processing` → `approved` → `archived`; force `rejected` with reason
Account	Read existing statuses	Force `suspended` → verify operation gates → reactivate
SI sessions	Initiate → message → terminate	Force `terminated` with timeout reason → verify `SESSION_NOT_FOUND` on next call
Reporting	Call `get_media_buy_delivery` → hope data exists	Simulate delivery → verify rollups
Budgeting	Create buy with budget → read back	Simulate spend to threshold → verify alerts and `payment_required`
Media buy	Create → pause → resume	Force seller-initiated `rejected` → verify terminal state

Sandbox gating

Sellers MUST NOT expose comply_test_controller on production deployments — to anyone, on any surface. The tool MUST be absent from tools/list (MCP) and from the agent card’s skills[] (A2A); the compliance_testing block MUST be absent from get_adcp_capabilities; dispatch MUST return the transport’s standard unknown-tool error (e.g., JSON-RPC -32601 Method not found for MCP, the unknown-skill rejection for A2A) — indistinguishable from the same-transport response of a seller that does not implement the tool. A production deployment that exposes the tool on any of these surfaces is non-conformant regardless of whether dispatch is gated. The canonical pattern is two deployments: one production (no controller wired), one sandbox/staging (controller wired for all comers). Sellers expose comply_test_controller only on sandbox/staging deployments; any principal that can authenticate to such a deployment can call it. Sellers MAY instead run a single deployment with mixed sandbox/live principals and project the tool per-principal, gating on the resolved account’s mode. This is an implementation pattern, not the canonical model. Sellers picking this pattern MUST gate all three surfaces consistently: tools/list (or skills[]), the compliance_testing capability block, and dispatch. Partial projection — e.g., gating tools/list but leaving the compliance_testing block visible to live principals, or returning FORBIDDEN (rather than unknown-tool) to a live principal who probes by name — is non-conformant; it reopens the discovery side channel that deployment-scoping closes. FORBIDDEN is reserved for the in-sandbox case where the caller is authorized to call the controller but params reference a non-sandbox account. Sandbox gating is enforced per-request on the account reference, not just at tool registration time. The mechanism for provisioning sandbox credentials and for separating production from sandbox/staging deployments is seller-specific and out of scope for this spec. Sellers MUST document their sandbox access mechanism so storyboard runners can connect appropriately. The storyboard runner MUST treat the presence of comply_test_controller in tools/list (or skills[]) or the presence of the compliance_testing block in get_adcp_capabilities on a connection it believes is production as a hard conformance failure.

Tool definition

Schemas: comply-test-controller-request.json | comply-test-controller-response.json Sellers that implement compliance test controller MUST:

Only expose the tool in sandbox mode (see sandbox gating above)
Enforce the same state transition rules as production — invalid transitions MUST return errors
Reflect forced state changes in subsequent reads (list_creatives, get_media_buys, etc.)

{
  "name": "comply_test_controller",
  "description": "Triggers seller-side state transitions for compliance testing. Sandbox only.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "scenario": {
        "type": "string",
        "enum": [
          "list_scenarios",
          "force_creative_status",
          "force_account_status",
          "force_media_buy_status",
          "force_create_media_buy_arm",
          "force_task_completion",
          "force_session_status",
          "simulate_delivery",
          "simulate_budget_spend",
          "seed_product",
          "seed_pricing_option",
          "seed_creative",
          "seed_plan",
          "seed_media_buy"
        ],
        "description": "The seller-side transition or fixture-seed to trigger."
      },
      "params": {
        "type": "object",
        "description": "Scenario-specific parameters. Omit for list_scenarios. force_creative_status: {creative_id, status, rejection_reason?}. force_account_status: {account_id, status}. force_media_buy_status: {media_buy_id, status, rejection_reason?}. force_create_media_buy_arm: {arm, task_id?, message?} — task_id required when arm = submitted. force_task_completion: {task_id, result}. force_session_status: {session_id, status, termination_reason?}. simulate_delivery: {media_buy_id, impressions?, clicks?, reported_spend?, conversions?}. simulate_budget_spend: {account_id|media_buy_id, spend_percentage}. seed_product: {product_id, fixture?}. seed_pricing_option: {product_id, pricing_option_id, fixture?}. seed_creative: {creative_id, fixture?}. seed_plan: {plan_id, fixture?}. seed_media_buy: {media_buy_id, fixture?}."
      }
    },
    "required": ["scenario"]
  }
}

The params description inlines param shapes for each scenario because MCP clients (including LLMs) read descriptions, not conditional schema branches. For formal validation schemas suitable for SDK code generation, see the per-scenario definitions below.

Scenarios

`force_creative_status`

Transitions a creative to the specified status. The seller MUST enforce valid transitions per the creative lifecycle state machine. Params:

Field	Type	Required	Description
`creative_id`	string	Yes	Creative to transition
`status`	`processing` \| `approved` \| `rejected` \| `pending_review` \| `archived`	Yes	Target status
`rejection_reason`	string	When `status` = `rejected`	Reason for rejection

Example:

{
  "scenario": "force_creative_status",
  "params": {
    "creative_id": "cr-123",
    "status": "rejected",
    "rejection_reason": "Brand safety policy violation"
  }
}

`force_account_status`

Transitions an account to the specified status. The seller MUST enforce the account lifecycle rules — terminal states (rejected, closed) cannot be exited. Params:

Field	Type	Required	Description
`account_id`	string	Yes	Account to transition
`status`	`active` \| `pending_approval` \| `rejected` \| `payment_required` \| `suspended` \| `closed`	Yes	Target status

Example:

{
  "scenario": "force_account_status",
  "params": {
    "account_id": "acct-456",
    "status": "payment_required"
  }
}

`force_media_buy_status`

Transitions a media buy to the specified status. The seller MUST enforce the media buy lifecycle — rejected is only valid from pending_creatives or pending_start. Params:

Field	Type	Required	Description
`media_buy_id`	string	Yes	Media buy to transition
`status`	`pending_creatives` \| `pending_start` \| `active` \| `paused` \| `completed` \| `rejected` \| `canceled`	Yes	Target status
`rejection_reason`	string	When `status` = `rejected`	Reason for rejection

Example:

{
  "scenario": "force_media_buy_status",
  "params": {
    "media_buy_id": "mb-789",
    "status": "rejected",
    "rejection_reason": "Policy violation"
  }
}

`force_create_media_buy_arm`

Shapes the next create_media_buy call from the caller’s authenticated sandbox account into a specific response arm. v1 supports two arms: submitted (the async task envelope, no media_buy_id yet) and input-required (the errors-branch). Unlike force_media_buy_status, no entity transitions — there is no media buy yet — so the response carries forced.arm rather than previous_state/current_state. The submitted-arm wire shape is otherwise implementation-dependent: most sellers route most buys synchronously and no buyer-side request shape reliably triggers async. This scenario lets storyboards pin the arm so a regressed seller (e.g., emitting media_buy_id under status: submitted) cannot pass conformance silently. Params:

Field	Type	Required	Description
`arm`	`submitted` \| `input-required`	Yes	Target response arm for the next `create_media_buy` call
`task_id`	string	When `arm` = `submitted`	Deterministic task handle (max 128 chars) the seller MUST emit verbatim on the submitted envelope and MUST accept on subsequent `tasks/get` polls. Sandbox task_ids are caller-opaque strings; production task-id format rules do not apply.
`message`	string	No	Human-readable explanation surfaced verbatim on the seller’s `create_media_buy` response. Plain text, max 2000 characters. Buyers consuming the resulting response MUST apply the prompt-injection sanitization documented for `message` on the submitted envelope — this scenario is the natural place for a runner to inject adversarial strings to test that buyer-side sanitization.

Example:

{
  "scenario": "force_create_media_buy_arm",
  "params": {
    "arm": "submitted",
    "task_id": "task_async_signed_io_q2",
    "message": "Awaiting IO signature from sales team; typical turnaround 2–4 hours"
  }
}

Response. A ForcedDirectiveSuccess shape carrying the registered directive:

{
  "success": true,
  "forced": {
    "arm": "submitted",
    "task_id": "task_async_signed_io_q2"
  },
  "message": "Next create_media_buy call will return the submitted arm with task_id task_async_signed_io_q2"
}

forced.task_id is present only when arm: submitted. Consumption and idempotency. The directive is keyed to the caller’s authenticated sandbox account (account + principal pair) and is consumed by the next create_media_buy call from that account. Subsequent calls without a fresh directive return the seller’s default arm. Buyer-side idempotency_key semantics are unchanged: if the caller replays a create_media_buy request that already consumed a directive, the seller MUST replay the cached response (the request idempotency cache wins) and MUST NOT re-evaluate against the now-empty directive slot. Sellers MUST NOT match a directive against a create_media_buy call from a different account or principal, even within the same transport connection. A second force_create_media_buy_arm call before the directive is consumed overwrites the prior one.

`force_task_completion`

Resolves a previously-submitted async task to completed with a buyer-supplied result payload. The companion to force_create_media_buy_arm: that scenario drives the seller into the submitted envelope; this one closes the loop by transitioning the task store entry to completed and stamping the registered result. The buyer observes completion via the seller’s push notification to push_notification_config.url (the canonical 3.0 delivery path for completion payloads) and via subsequent tasks/get calls reporting status: "completed". A typed result projection on the polling response is tracked for 3.1 in #3123. The submitted → completed lifecycle is otherwise non-deterministic — real task completions ride on out-of-band signals (IO countersignature, batch processor cron, governance human review). Storyboards cannot wait. This scenario lets a runner pin the completion deterministically immediately after registering the directive, so the buyer-side polling assertion fires on the same wire shape buyers will observe in production. Params:

Field	Type	Required	Description
`task_id`	string	Yes	Task to resolve. MUST resolve within the caller’s authenticated sandbox account; sellers MUST return `NOT_FOUND` (not `FORBIDDEN`, per the multi-tenant convention above) for `task_id`s belonging to other accounts. Typically captured from the prior `create_media_buy` submitted-envelope response (or registered via `force_create_media_buy_arm`).
`result`	`async-response-data`	Yes	Completion payload to record. Validates against the same `anyOf` union the push-notification webhook and `tasks/get` polling responses use. For `create_media_buy`, this is a `CreateMediaBuyResponse` with `media_buy_id` and `packages`. Sellers MUST emit `INVALID_PARAMS` if `result` does not validate against the response branch for the task’s original method. Sellers MAY reject `result` payloads exceeding 256 KB with `INVALID_PARAMS`; storyboards MUST stay below this.

Example:

{
  "scenario": "force_task_completion",
  "params": {
    "task_id": "task_async_signed_io_q2",
    "result": {
      "media_buy_id": "mb_async_signed_io_q2",
      "status": "active",
      "packages": [
        { "package_id": "pkg-0", "product_id": "async_signed_io_q2", "budget": 30000 }
      ]
    }
  }
}

Response. Returns a state-transition success shape:

{
  "success": true,
  "previous_state": "submitted",
  "current_state": "completed",
  "message": "Task task_async_signed_io_q2 transitioned from submitted to completed"
}

Source state MUST be submitted, working, or input-required; any other source returns INVALID_TRANSITION. Sellers MUST emit NOT_FOUND if task_id is unknown to the caller’s account, and INVALID_TRANSITION if the task is already terminal (completed / failed / canceled). Forcing a task to failed is out of scope for this scenario; the input-required arm of force_create_media_buy_arm covers the buyer-input-needed failure path. Replay semantics. Replays with identical params before the task is terminal are idempotent no-ops. Replays with diverging params before the task is terminal MUST overwrite the registered result (last-write-wins) — same precedent as force_create_media_buy_arm’s “second call overwrites.” After the task is terminal, every replay returns INVALID_TRANSITION regardless of params. Cross-protocol obligations.

Push notifications. If the buyer registered push_notification_config.url on the original create_media_buy, forcing completion MUST fire the webhook with the registered result payload (the canonical 3.0 delivery path for completion data). Otherwise the storyboard can only test polling for terminal status, not push delivery of the result.
simulate_delivery / simulate_budget_spend. Once forced to completed with a valid CreateMediaBuyResponse carrying media_buy_id, the resulting media buy MUST be addressable by those scenarios. Round-tripping through force_task_completion is the supported path for storyboards that need a media buy without going through the synchronous flow.

Buyer-side observation. After this scenario runs, the registered result is delivered to the buyer’s push_notification_config.url (3.0 canonical path) with all caller-supplied fields preserved. Sellers MAY augment with seller-controlled fields (e.g., created_at, dsp_* IDs, normalized currency casing) but MUST NOT overwrite caller-supplied values. A subsequent tasks/get(task_id) MUST return status: "completed". The result payload is buyer-controlled in sandbox and round-trips through the seller’s store — buyers receiving it via webhook MUST treat the payload as untrusted seller output (per AdCP convention) regardless of the fact that they originated the bytes. This makes force_task_completion the natural place for a runner to inject adversarial payloads when testing buyer-side sanitization on the webhook delivery path.

`force_session_status`

Transitions an SI session to a terminal status. Enables testing timeout and termination scenarios that would otherwise require waiting for real timeouts. The termination_reason param simulates the cause so the storyboard runner can verify sellers report the correct reason in subsequent responses. Params:

Field	Type	Required	Description
`session_id`	string	Yes	Session to transition
`status`	`complete` \| `terminated`	Yes	Target terminal status
`termination_reason`	string	When `status` = `terminated`	Reason for termination (e.g., `session_timeout`, `host_terminated`, `policy_violation`)

Example:

{
  "scenario": "force_session_status",
  "params": {
    "session_id": "sess-abc",
    "status": "terminated",
    "termination_reason": "session_timeout"
  }
}

`simulate_delivery`

Injects synthetic delivery data for a media buy. Subsequent calls to get_media_buy_delivery MUST reflect this data. Delivery simulation is additive — each call adds to existing delivery totals. Delivery and budget are independent systems. simulate_delivery records what the ad server would report. simulate_budget_spend records what the billing system would track. A seller’s production system may or may not couple these — the test controller does not assume coupling. Params:

Field	Type	Required	Description
`media_buy_id`	string	Yes	Media buy to add delivery to
`impressions`	integer	No	Impressions to simulate
`clicks`	integer	No	Clicks to simulate
`reported_spend`	object	No	`{ amount: number, currency: string }` — spend as reported in delivery data, does not affect budget
`conversions`	integer	No	Conversions to simulate

Example:

{
  "scenario": "simulate_delivery",
  "params": {
    "media_buy_id": "mb-789",
    "impressions": 10000,
    "clicks": 150,
    "reported_spend": { "amount": 150.00, "currency": "USD" }
  }
}

`simulate_budget_spend`

Simulates budget consumption to a specified percentage. Enables testing budget threshold alerts and payment_required transitions without waiting for real spend. This is the only scenario that affects account-level financial state. After calling simulate_budget_spend, the seller MUST reflect the simulated consumption in get_account_financials. Specifically:

total_spend (or equivalent) MUST reflect the simulated amount
remaining_budget (or equivalent) MUST be reduced accordingly
Budget utilization percentages MUST match spend_percentage

Params:

Field	Type	Required	Description
`account_id`	string	No	Account (for account-level budget)
`media_buy_id`	string	No	Media buy (for buy-level budget)
`spend_percentage`	number	Yes	Spend to this % of budget (0–100)

At least one of account_id or media_buy_id is required. The target entity MUST have a non-zero budget configured — the controller SHOULD return INVALID_PARAMS if it does not. Example:

{
  "scenario": "simulate_budget_spend",
  "params": {
    "media_buy_id": "mb-789",
    "spend_percentage": 95
  }
}

`seed_product`

Creates (or upserts) a product fixture with a caller-supplied product_id so subsequent storyboard steps can reference the product by stable ID. The controller MUST make the seeded product discoverable via get_products under the authenticated account unless the fixture explicitly marks it hidden. Why this scenario exists. Storyboards hardcode fixture IDs like "test-product" and expect the seller to have a matching product. Without a seed scenario, every implementer rediscovers which IDs the conformance suite expects and has to alias them by hand. seed_product replaces that discovery with an explicit, storyboard-authored contract. Params:

Field	Type	Required	Description
`product_id`	string	Yes	Stable identifier the storyboard will reference
`fixture`	object	No	Product shape. Minimum useful fields: `delivery_type`, `channels`, `pricing_options[]`, `format_ids[]`. Sellers MAY fill in defaults for omitted fields.

Example:

{
  "scenario": "seed_product",
  "params": {
    "product_id": "test-product",
    "fixture": {
      "delivery_type": "non_guaranteed",
      "channels": ["display"],
      "pricing_options": [
        { "pricing_option_id": "test-pricing", "pricing_model": "cpm", "currency": "USD", "floor_price": 1.0 }
      ],
      "format_ids": [{ "id": "display_300x250" }]
    }
  }
}

`seed_pricing_option`

Adds (or upserts) a pricing option on an existing seeded product. Use this when a storyboard needs a specific pricing option that wasn’t included in the initial seed_product call, or when the option’s attributes need to diverge from the seller’s default. Params:

Field	Type	Required	Description
`product_id`	string	Yes	Parent product (must already exist — seed it first)
`pricing_option_id`	string	Yes	Stable identifier for the pricing option
`fixture`	object	No	Pricing option shape per the `PricingOption` schema (`pricing_model`, `currency`, `floor_price` for auction-based, `fixed_price` for fixed, etc.)

Example:

{
  "scenario": "seed_pricing_option",
  "params": {
    "product_id": "test-product",
    "pricing_option_id": "default",
    "fixture": {
      "pricing_model": "cpm",
      "floor_price": 5.0,
      "currency": "USD"
    }
  }
}

`seed_creative`

Creates a creative fixture at a specific lifecycle status. Lets governance and delivery storyboards reference a pre-approved creative without round-tripping sync_creatives first. Params:

Field	Type	Required	Description
`creative_id`	string	Yes	Stable identifier
`fixture`	object	No	Creative shape. Typical fields: `status`, `format_id`, `assets`, `click_through_url`.

Example:

{
  "scenario": "seed_creative",
  "params": {
    "creative_id": "campaign_hero_video",
    "fixture": {
      "status": "approved",
      "format_id": { "id": "video_30s" },
      "assets": [{ "type": "video", "url": "https://example.com/hero.mp4" }]
    }
  }
}

`seed_plan`

Creates a media plan fixture. Used by governance storyboards that assert against a specific plan without running the full briefing + proposal flow first. Params:

Field	Type	Required	Description
`plan_id`	string	Yes	Stable identifier
`fixture`	object	No	Plan shape. Typical fields: `budget`, `brand`, `flight`, `line_items[]`.

Example:

{
  "scenario": "seed_plan",
  "params": {
    "plan_id": "gov_acme_q2_2027",
    "fixture": {
      "budget": { "total": 30000, "currency": "USD" },
      "brand": { "domain": "acmeoutdoor.example" },
      "flight": { "start": "2027-04-01", "end": "2027-06-30" }
    }
  }
}

`seed_media_buy`

Creates a media buy fixture at a specified lifecycle state, bypassing the create_media_buy flow. Used by storyboards that need to assert governance or delivery behavior against a pre-existing buy. Params:

Field	Type	Required	Description
`media_buy_id`	string	Yes	Stable identifier
`fixture`	object	No	Media buy shape. Typical fields: `status`, `packages[]`, `budget`, `flight`.

Example:

{
  "scenario": "seed_media_buy",
  "params": {
    "media_buy_id": "mb_acme_q2_2026_auction",
    "fixture": {
      "status": "active",
      "packages": [{ "package_id": "pkg_001", "product_id": "test-product" }]
    }
  }
}

Seeding semantics and ordering

Fixture shape. fixture is kept permissive (additionalProperties: true) so storyboard authors can declare the minimum shape each test needs. Fixtures SHOULD conform to the corresponding domain schema (core/product.json for seed_product, core/pricing-option.json for seed_pricing_option, media-buy/sync-creatives-request.json creative-item shape for seed_creative, core/media-buy.json for seed_media_buy, the plan schema for seed_plan). Sellers MAY reject clearly malformed fixtures with INVALID_PARAMS.
Idempotency on re-seed. A second call with the same primary ID and a fixture equivalent to the first SHOULD succeed and return success: true with previous_state: "existing". A second call with a diverging fixture MUST return INVALID_PARAMS with error_detail explaining which fields diverged — sellers MUST NOT merge or update silently. Storyboards that need to change fixture state mid-run MUST use force_* scenarios, not a re-seed. This keeps the same storyboard deterministic across sellers.
Foreign-key ordering. The runner seeds fixtures in dependency order so sellers receive referenced parents before their children. The dependency DAG:
```
product ──┬─→ pricing_option
          ├─→ plan
          └─→ media_buy
creative ────→ media_buy
plan ────────→ media_buy
```
Concretely: seed_product before seed_pricing_option; seed_product, seed_creative, and seed_plan all before seed_media_buy when the fixture references them. Storyboards that declare a fixtures: block MUST list entries in an order the runner can topologically sort — sellers that receive a seed_pricing_option for a product that does not exist, or a seed_media_buy referencing a creative/product/plan that was not seeded first, MUST return INVALID_PARAMS rather than auto-create the parent.
Sandbox scope. Seeded fixtures exist only for the authenticated sandbox account. NOT_FOUND applies the same way as for force_* — a seller that cannot see the parent product for the caller’s account MUST return NOT_FOUND, not silently fall back to another tenant.
Capability advertisement. Sellers that do not implement a given seed scenario MUST return UNKNOWN_SCENARIO for that scenario name. The runner treats UNKNOWN_SCENARIO on a seed_* as a coverage gap for storyboards whose prerequisites.controller_seeding requires the scenario — those storyboards are graded not_applicable, not failed. This applies to unfamiliar seed_* names as well: a runner may emit a scenario the seller has never seen because the enum is open-for-extension (see below). Sellers and runners MUST respond with UNKNOWN_SCENARIO rather than schema-reject an unrecognized scenario value.
Open-for-extension enum. The scenario enum adds new values over time (new seed scenarios land as specialisms demand them). Runners and sellers MUST accept scenario strings they do not recognize and respond with UNKNOWN_SCENARIO rather than hard-fail schema validation — otherwise every new enum value becomes a breaking change for stale implementations.

Response shape

State transition responses (`force_*`)

Success:

{
  "success": true,
  "previous_state": "processing",
  "current_state": "approved",
  "message": "Creative cr-123 transitioned from processing to approved"
}

Failure (invalid transition):

{
  "success": false,
  "error": "INVALID_TRANSITION",
  "error_detail": "Cannot transition from archived to processing — archived is terminal",
  "current_state": "archived"
}

Failure (unknown entity):

{
  "success": false,
  "error": "NOT_FOUND",
  "error_detail": "Creative cr-unknown not found",
  "current_state": null
}

Simulation responses (`simulate_*`)

simulate_delivery response:

{
  "success": true,
  "simulated": {
    "impressions": 10000,
    "clicks": 150,
    "reported_spend": { "amount": 150.00, "currency": "USD" }
  },
  "cumulative": {
    "impressions": 25000,
    "clicks": 380,
    "reported_spend": { "amount": 375.00, "currency": "USD" }
  },
  "message": "Delivery simulated for mb-789: 10000 impressions, 150 clicks, $150.00 spend"
}

The simulated field echoes back the values injected by this call. The cumulative field returns running totals across all simulation calls for this media buy, so callers can verify expected state before checking get_media_buy_delivery. simulate_budget_spend response:

{
  "success": true,
  "simulated": {
    "spend_percentage": 95,
    "computed_spend": { "amount": 950.00, "currency": "USD" },
    "budget": { "amount": 1000.00, "currency": "USD" }
  },
  "message": "Budget for mb-789 set to 95% consumed ($950.00 of $1000.00)"
}

Error codes

Controllers MUST use structured error codes so the storyboard runner can assert on specific failure modes:

Error code	When
`INVALID_TRANSITION`	Requested state-machine transition is not valid (e.g., `archived → processing`, `canceled → paused`)
`INVALID_STATE`	Operation is not permitted for the resource’s current status (e.g., re-seeding a fixture that already exists with a diverging shape)
`NOT_FOUND`	Entity does not exist or caller does not have access (multi-tenant sandboxes SHOULD treat “not yours” as “not found”)
`UNKNOWN_SCENARIO`	Scenario not implemented by this seller
`INVALID_PARAMS`	Missing or malformed params, or precondition not met (e.g., `simulate_budget_spend` on an entity with no budget configured)
`FORBIDDEN`	Production account referenced from a sandbox connection
`INTERNAL_ERROR`	Transient seller-side failure (e.g., sandbox database unavailable). The runner SHOULD retry once before treating as a failure.

Controller-specific enum. The error field on controller responses uses a controller-specific vocabulary defined in comply-test-controller-response.json, distinct from the canonical seller-response error-code.json enum that governs task-level errors. INVALID_TRANSITION is controller-specific (state-machine primitives expose the transition-vs-state distinction that seller-level error codes collapse into INVALID_STATE). Storyboard assertions on controller responses use path: "error" or direct field_value checks, not check: error_code — the shape-agnostic error_code check is for task-response errors (adcp_error / payload errors[]), not the controller’s own response schema.

Idempotency

State transition scenarios (force_*) are idempotent: forcing a status that matches the current state returns success with previous_state equal to current_state. This avoids flaky tests when the runner retries after transient failures. Simulation scenarios (simulate_*) are NOT idempotent — simulate_delivery adds to existing totals, while simulate_budget_spend replaces the current spend level.

Test surfaces

Where a seller’s state-of-record lives determines how the storyboard test loop closes. State-local sellers (typically SSPs, creative agents) write to the seller’s DB via the seed_* scenarios above; the seller’s read handlers consume the same store, and the seed→read loop closes naturally. Upstream-proxy sellers (DSPs proxying to platforms, retail-media networks reading retailer catalogs, signals brokers) cannot close the loop that way because their read handlers reach a system the seller does not control; the TypeScript SDK ships a TestControllerBridge that runs the real adapter call first, then merges seeded fixtures into the response. Either path earns the wire-format pass that AAO Verified (Spec) attests. Neither path is what (Sandbox) attests — that’s a separate axis covering whether the seller’s production stack honors account.sandbox: true without real-world side effects. The cross-page framing for both implementations of this pattern, the SDK’s _bridge advisory marker, and the runtime-signals disambiguation table all live in the Conformance Specification → Test surfaces and the storyboard loop.

Compliance testing modes

The presence of comply_test_controller in a seller’s tool list determines which mode a compliance tester uses:

Capability discovery

A seller may implement the test controller without supporting every scenario. The storyboard runner SHOULD call comply_test_controller with scenario: "list_scenarios" as the first interaction. Sellers that support this return the list of implemented scenarios:

{
  "success": true,
  "scenarios": [
    "force_creative_status",
    "force_account_status",
    "force_media_buy_status"
  ]
}

Sellers that implement list_scenarios MUST respond with scenario names that appear verbatim in the scenario enum of comply-test-controller-request.json. Custom seller-specific scenario names are not part of the compliance contract; storyboard runners will not dispatch to scenarios outside the canonical enum, so listing them serves no purpose. A seller that supports seed_product MUST respond with the string "seed_product" — not "create_test_product" or any other variant. Sellers that do not implement list_scenarios SHOULD return an error with UNKNOWN_SCENARIO. When this happens, the runner tries each scenario individually and treats UNKNOWN_SCENARIO responses as coverage gaps (not failures). This means early implementers who skip list_scenarios are not penalized — the runner discovers supported scenarios through trial.

Observational mode (default)

When comply_test_controller is not available:

The runner executes buyer-initiated flows and validates response schemas
State machine transitions that require seller action are skipped
Advisory observations note what could not be tested

Deterministic mode

When comply_test_controller is available:

The runner walks every reachable state in each lifecycle
Forces edge cases: terminal states, invalid transitions, error codes
Validates that forced state changes are reflected in subsequent reads
Tests operation gates (e.g., create_media_buy blocked when account is suspended)

The runner distinguishes three outcome categories in deterministic mode:

Scenario not supported — returned by list_scenarios or UNKNOWN_SCENARIO error. Reported as a coverage gap, not a failure.
Transition correctly rejected — controller returned INVALID_TRANSITION for an invalid state change. This is a pass.
Unexpected failure — controller returned an error for a transition that should be valid, or succeeded on a transition that should fail. This is a compliance failure.

Example: creative lifecycle in deterministic mode

sync_creatives(creative)
list_creatives() → verify status = "processing"
force_creative_status(creative_id, "pending_review")
force_creative_status(creative_id, "approved")
list_creatives() → verify status = "approved"
force_creative_status(creative_id, "archived")
list_creatives() → verify status = "archived"
sync_creatives(same creative) → verify unarchive (→ approved or pending_review)
force_creative_status(creative_id, "rejected", reason)
list_creatives() → verify rejection_reason persisted
sync_creatives(same creative) → verify resubmission (rejected → processing)
force_creative_status(creative_id, "approved") → expect INVALID_TRANSITION (must go through pending_review)

Example: account operation gates in deterministic mode

sync_accounts(account) → active
force_account_status(account_id, "suspended")
create_media_buy() → expect ACCOUNT_SUSPENDED
get_media_buys() → expect existing buys still readable
force_account_status(account_id, "active")
create_media_buy() → expect success
force_account_status(account_id, "payment_required")
update_media_buy(add packages) → expect ACCOUNT_PAYMENT_REQUIRED
get_media_buys() → existing buys still readable

Example: media buy lifecycle in deterministic mode

create_media_buy() → status = "pending_creatives"
force_media_buy_status(media_buy_id, "rejected", reason) → expect success
get_media_buys() → verify status = "rejected", rejection_reason persisted
force_media_buy_status(media_buy_id, "active") → expect INVALID_TRANSITION (rejected is terminal)
create_media_buy() → new buy, status = "pending_creatives"
force_media_buy_status(media_buy_id, "pending_start")
force_media_buy_status(media_buy_id, "active")
force_media_buy_status(media_buy_id, "rejected") → expect INVALID_TRANSITION (rejected only valid from pending_creatives or pending_start)

Example: delivery and budget verification

create_media_buy(budget: $1000)
simulate_delivery(impressions: 10000, reported_spend: $500)
get_media_buy_delivery() → verify delivery reflects simulated data
   (reported_spend is delivery-only; does not affect account budget)
simulate_budget_spend(spend_percentage: 95)
get_account_financials() → verify total_spend reflects 95% ($950, not $500 from delivery)
simulate_budget_spend(spend_percentage: 100)
force_account_status("payment_required")
create_media_buy() → expect ACCOUNT_PAYMENT_REQUIRED

Certification tiers

Tier	Requirement	What it proves
Functional compliance	Pass all storyboards in observational mode	Tools exist, respond correctly, and complete buyer-initiated flows
Stateful compliance	Pass all storyboards in deterministic mode	State machines enforce correct transitions, error codes match spec, operation gates block correctly

Specialism-scoped seed requirements. Stateful compliance also requires that sellers implement the seed_* scenarios covering the specialisms they certify against. The UNKNOWN_SCENARIO → not_applicable grading is for honest coverage reporting on missing surface area, not a blanket opt-out from conformance — a seller certifying sales-non-guaranteed MUST implement at least seed_product and seed_pricing_option; a seller certifying creative-ad-server MUST implement seed_creative; a seller certifying governance-delivery-monitor MUST implement seed_plan (and seed_media_buy where the storyboard requires it). The storyboard authors in static/compliance/source/specialisms/ declare the fixtures their storyboards need; sellers match that list to the specialisms on their cert.

Implementation guidance

For sellers

Gate comply_test_controller at the deployment level — it MUST NOT appear in tools/list (or A2A skills[]), MUST NOT be advertised via the compliance_testing capability block, and MUST dispatch to unknown-tool on production deployments. See Sandbox gating for the full rule.
Reuse your production state machine logic — the controller should call the same internal transition functions, not bypass them
Enforce transition rules — if rejected is terminal in production, force_media_buy_status(rejected → active) must fail via the controller too
Reflect changes immediately — after a forced transition, the next list_* or get_* call must return the updated state

For compliance testers

Detect the tool during profile discovery via tools/list
Call list_scenarios to discover which scenarios are supported
Run observational mode as the baseline — it works everywhere
Layer deterministic scenarios on top when the controller is available
Report which mode was used and distinguish coverage gaps from failures
Test the controller’s transition validation itself — invalid transitions should return INVALID_TRANSITION, not silently succeed

Design decisions

Sellers validate transition ordering. The controller enforces the same state machine rules as production. Calling force_creative_status(approved) on a creative that was never processing is an error — the controller rejects it just as production would. The lifecycle state machines referenced here are defined in the respective protocol specifications (see creative lifecycle, account lifecycle, media buy lifecycle, SI session lifecycle).
Tests are self-contained. Each test SHOULD create dedicated entities (media buys, creatives, accounts) rather than reusing existing ones. This ensures additive simulation calls (simulate_delivery) start from known-zero state without needing a reset mechanism. No reset scenario is needed. Compliance testers SHOULD use unique identifiers (e.g., UUIDs) for test entities to avoid collisions when multiple storyboard runner instances run against the same sandbox concurrently. Sandbox entity cleanup (e.g., TTL-based expiration) is the seller’s responsibility.
Delivery simulation uses a synthetic marker. simulate_delivery records MAY include a synthetic: true field that sellers can use internally for bookkeeping. The runner ignores this marker — it validates get_media_buy_delivery responses against the same schema regardless. This lowers the implementation bar for sellers without affecting test correctness.
One tool, many scenarios. The single-tool design keeps context window cost to ~500 tokens vs ~1,400 for seven separate tools. Sellers implement one sandbox gate. The runner detects one tool. The list_scenarios introspection handles partial implementations without requiring per-tool presence detection.

Documentation Index

​Compliance test controller

​Motivation

​Sandbox gating

​Tool definition

​Scenarios

​force_creative_status

​force_account_status

​force_media_buy_status

​force_create_media_buy_arm

​force_task_completion

​force_session_status

​simulate_delivery

​simulate_budget_spend

​seed_product

​seed_pricing_option

​seed_creative

​seed_plan

​seed_media_buy

​Seeding semantics and ordering

​Response shape

​State transition responses (force_*)

​Simulation responses (simulate_*)

​Error codes

​Idempotency

​Test surfaces

​Compliance testing modes

​Capability discovery

​Observational mode (default)

​Deterministic mode

​Example: creative lifecycle in deterministic mode

​Example: account operation gates in deterministic mode

​Example: media buy lifecycle in deterministic mode

​Example: delivery and budget verification

​Certification tiers

​Implementation guidance

​For sellers

​For compliance testers

​Design decisions

Compliance test controller

Motivation

Sandbox gating

Tool definition

Scenarios

`force_creative_status`

`force_account_status`

`force_media_buy_status`

`force_create_media_buy_arm`

`force_task_completion`

`force_session_status`

`simulate_delivery`

`simulate_budget_spend`

`seed_product`

`seed_pricing_option`

`seed_creative`

`seed_plan`

`seed_media_buy`

Seeding semantics and ordering

Response shape

State transition responses (`force_*`)

Simulation responses (`simulate_*`)

Error codes

Idempotency

Test surfaces

Compliance testing modes

Capability discovery

Observational mode (default)

Deterministic mode

Example: creative lifecycle in deterministic mode

Example: account operation gates in deterministic mode

Example: media buy lifecycle in deterministic mode

Example: delivery and budget verification

Certification tiers

Implementation guidance

For sellers

For compliance testers

Design decisions