Constitution

Agent Etna runs on Anthropic's constitution.

In January 2026 Anthropic published a constitution for Claude — a foundational document describing how a good AI agent should behave. Agent Etna has adopted it: as the standard for how Etna itself behaves, and as the bar every change to the agents it develops has to clear.

The order that matters.

The constitution asks an agent to be, in priority order:

1. Broadly safe — never undermine the human mechanisms that oversee AI. 2. Broadly ethical and honest — act on good values and avoid harm. 3. Compliant — follow the rules it's been given. 4. Genuinely helpful — benefit the people it works for. When these conflict, the earlier ones win. Helpfulness never overrides honesty; honesty never overrides safety.

That ordering is the part most optimization loops get wrong. A system tuned only for "helpful" learns to be helpful at the expense of everything above it. Etna inverts that.

How a constitution becomes code.

The Good Change gate. The sandbox answers "did this change run safely?" Etna adds a second, separate gate that answers "did this change get better for the right reasons?" A change that improves a metric by eroding a safety behaviour is not a good change — and it doesn't ship.

A safety battery, run before and after every change. Each proposed change is probed against a fixed set of constitution-grounded tests — resisting instruction-override (oversight), refusing irreversible actions like a blind double-refund or a destructive delete (avoid harm), and refusing to fabricate (honesty). A change may ship only if safety behaviour is unchanged or strictly better on every probe. Any regression blocks the change, no matter how good the metric looks.

Invariants the optimizer can't touch.

Our position: some behaviours are not up for optimization. "Never issue a refund without confirmation." "Never delete without a recoverable path." "Never disable a content filter." These are declared per agent and sit outside the optimization target entirely.

Where Agent Etna helps: any proposed change that weakens a declared invariant is rejected at the ship gate, regardless of its metric impact. The loop cannot "win" by relaxing one.

The manner of success counts.

Our position: reaching the goal is not enough if it's reached recklessly. A task completed via a skipped confirmation, an irreversible shortcut, or a bypassed safeguard is a worse outcome than one reached carefully — even when the end result is the same.

Where Agent Etna helps: scoring is manner-aware. Reckless success is downgraded and flagged, so the loop is never rewarded for learning reckless behaviour. There is no incentive gradient toward cutting corners.

Honesty over confidence — including our own.

Our position: the constitution holds honesty above helpfulness, and that applies to Etna's own voice. A confident wrong answer is worse than an honest "I'm not sure yet."

Where Agent Etna helps: Etna is built to be decisive when the evidence is there and to state calibrated uncertainty when it isn't — to say plainly when it hasn't checked something rather than assert a guess as fact. Held-out validation enforces the same discipline on the agents it develops: a change is measured against scenarios the optimizer never saw, so improvement has to be real, not gamed.

Read the source.

Anthropic's constitution is public. Agent Etna adopted it as the foundation for how it develops and ships AI agents.

Anthropic's constitution