Novaberg
Novaberg Papers · II of III
CategoryThought leadership
Reading time≈ 22 minutes
AudienceArchitects & practitioners
ImplementationLangGraph · TypedDict
LicenseApache 2.0

The Clipboard Pattern:
a better way to compose AI agents.

Multi-agent systems shouldn't gossip. They should share a case file.

By Claus Schlehhuber · Independent research · April 2026 · master@novaberg.de

Abstract

Multi-agent LLM systems commonly couple specialised agents through natural-language messages. One agent produces a textual result. Another agent reads that result, re-encodes it in its own vocabulary, and produces its own textual output. I argue that this is an architectural anti-pattern, not an intended feature.

Each natural-language re-encoding step is a lossy compression: it erases types, structure, and provenance. It introduces semantic drift that accumulates with every hop. It makes individual agents untestable because their inputs and outputs are strings. And it multiplies the token budget for anything non-trivial.

I propose the Clipboard Pattern: a shared typed state object, enforced by a TypedDict schema, that flows through a sequence of specialist nodes within a cognitive unit (a LangGraph). Each specialist reads the fields it needs, writes its results into structured fields, and hands the clipboard to the next specialist. No message passing. No re-encoding.

This article shows how the pattern is implemented in Novaberg's graphs, how a three-level taxonomy — Roles, Departments, Graphs — keeps composition clean at scale, and why a single typed state object is the technical expression of the principle: transport data completely, let the consumer format.

§ 1
Opening metaphor

A law firm does not email between the lawyer and the accountant.

A case file moves from desk to desk. The litigator reads the facts, drafts a legal position, slides the file to the paralegal. The paralegal pulls the precedents, adds them to the file, slides it back. Nobody writes emails to each other. Nobody summarises the file in their own words. The file is the truth. Each person contributes a section. At the end, the file is the output.

Now look at most multi-agent LLM systems in the wild. An agent generates a paragraph. Another agent reads that paragraph, writes its own paragraph back. A third agent writes about both paragraphs. By the time the workflow ends, nobody can tell you what happened, and the tokens you paid for were spent on three agents each rewriting the previous one's summary in their own voice.

Put the two pictures side by side. One is obviously how serious work happens. The other is a children's game dressed up as orchestration. The rest of this article is a defence of the first picture and a critique of the second — grounded in code, in a working system, and in the pragmatic admission that this pattern is not new. It is just not named.

Listing 1 Two ways to compose agents python
# Pattern A — agent-to-agent messaging
result = agent_b.invoke(agent_a.invoke(user_message))

# Pattern B — clipboard
state = agent_a(state)
state = agent_b(state)
Both of these look similar. Only one of them tells you what Agent B received.
§ 2
The default pattern and its problems

Four damages accumulate at every hop.

The default pattern in popular multi-agent frameworks — CrewAI, AutoGen, and most "agent-supervisor" templates in the LangChain ecosystem — is this: specialised agents communicate by sending each other natural-language strings. One agent's output becomes another's prompt. The framework takes care of scheduling and handoffs. The application author writes roles and instructions; the agents do the rest.

This is the pattern Marina Wyss documents carefully in her 2025 overview of the field, "AI Agents: Complete Course" — a 150-page synthesis by a senior practitioner at Amazon. She devotes a chapter to "Communication Pitfalls," and she is not wrong about what they are. What I disagree with is the proposed remedy. The remedy offered by the consensus is better prompts and clearer roles. I think the remedy has to be architectural. The pitfalls are not behaviours the paradigm accidentally produces; they are what the paradigm is.

Damage one — semantic drift

Every time an agent reads another agent's text and writes its own, something subtle changes. The receiving agent has its own vocabulary canon, its own sense of what matters, its own training-data-induced priors. It restates the claim in its own voice. That voice is not identical to the source. After two hops, the error is still small. After five, the original intent is a rumour.

In one system I audited, a legal-review agent was asked to verify a contract. What it received was not the contract. It was a compliance officer's summary of the contract. The summary had dropped the termination clause. The legal-review agent dutifully found nothing wrong. Nobody was at fault. The architecture was at fault.

Damage two — token cost

Every natural-language hop costs tokens. The receiving agent must read the sender's text and generate its own. If four agents are in the loop, you pay for four re-encodings of roughly the same information. A typed field that says confidence: 0.83 costs one token in the prompt and zero tokens to read. A paragraph that says "I'm fairly confident, maybe eighty percent or so" costs fifteen tokens to write and fifteen to read, and tells the next agent less.

Damage three — untestability

An agent whose input is a free-form string cannot be unit-tested in any meaningful way. You can assert that the output "mentions termination" — and that assertion passes or fails on the whim of the generation. There is no contract to enforce and no fixture to pin. Production LLM work is work; it deserves tests. Strings-in, strings-out is the opposite of tests.

A typed field, by contrast, is a thing you can assert about. state["agent_results"][0].status == "abgeschlossen" either holds or it does not. The node that should produce it can be driven by fixture input and verified by fixture output. The gods of CI smile.

Damage four — audit trail

A trail of natural-language exchanges is, technically, an audit trail. Practically it is an archaeology site. Who decided to include the precedent? Why was the termination clause dropped? You can read the exchange and guess. A typed state, written to by named nodes with declared write-sets, lets you answer those questions in O(1): the field was set by this node at this graph step.

Table I · four patterns, four trade-offs
PatternCommunicationSymptom
Agent-to-agent messages
CrewAI · AutoGen
Text strings Semantic drift, token cost, string-in I/O
Supervisor + tool-handoff
LangChain · ReAct
Text + implicit tool args Better, but the arg schema is implicit
Shared memory blocks
Letta / MemGPT
LLM-edited memory Flexible, non-deterministic
Clipboard
Novaberg · LangGraph
Typed state + dispatch Deterministic, testable, no drift

None of this is a claim that text-messaging is never the right choice. It is a claim that text-messaging as the default coupling between specialists is expensive, fragile, and opaque. The default has to change.

§ 3
The pattern, in code

A clipboard is a typed dictionary that one function writes and the next reads.

The Clipboard Pattern is not a framework. It is a discipline. In Python, the discipline is enforced by a TypedDict and by the runtime guarantees of a state graph like LangGraph. The rules are three.

First, there is exactly one state object per cognitive unit — one clipboard per case, if you like. Second, every node declares, by convention, which fields it reads and which it writes; a reviewer reading the code can produce the dependency graph of the pipeline without running it. Third, there are no messages. Nodes do not talk to each other. They talk to the state.

Listing 2 The state, typed python · TypedDict
class ConversationState(TypedDict):
    user_input:         str
    intent:             str
    current_emotion:    str
    needs_memory:       bool
    memory_context:     str
    agent_name:         str                  # one agent per turn
    agent_results:      list[AgentResult]     # audit trail
    response:           str
A minimal sketch. The real ConversationState in Novaberg carries over sixty fields — each with a single declared writer and one or more readers.

The crucial discipline is not in TypedDict itself — Python will happily run you off a cliff — but in the code-review culture around it. A node function has the shape State → State. It receives the clipboard, modifies its own fields, and returns. That is the entire contract. Consider the planner that routes work inside the character's graph:

Listing 3 Planner and dispatch python · node shape
def planner_node(state: ConversationState) -> ConversationState:
    # reads:  intent, management_target, agent_results
    # writes: agent_name  (empty when nothing left to do)
    state["agent_name"] = next_agent_to_run(state)
    return state


def agent_dispatch_node(state: ConversationState) -> ConversationState:
    # reads:  agent_name, state (as a whole — the clipboard)
    # writes: agent_results (+1), agent_name = ""
    dispatch = find_dispatch(state["agent_name"])
    state["agent_results"].append(dispatch(state))
    state["agent_name"] = ""
    return state


# LangGraph conditional edges:
#   planner -> agent_dispatch   if agent_name != ""
#   planner -> responder        if agent_name == ""
#   agent_dispatch -> planner   (loop: planner decides anew)
No agent_queue. No batch. The planner makes exactly one decision per pass — "which agent, if any" — and the dispatch runs it. Multi-agent turns happen by iteration, never by a pre-filled list; later agents can build on the results of earlier ones, and the planner exits the moment the answer is ripe.

Two things deserve to be said out loud. The difference from the text-messaging pattern is not stylistic; it is type safety. The difference from a queue-based scheduler is not stylistic either; it is observability. Each iteration of the loop is a distinct graph step with a full state snapshot — you can pause, inspect, replay. A queue would hide the same decisions inside a function's local scope.

The dispatch — a door between two worlds

The word "dispatch" in agent_dispatch_node hides a small but consequential pattern. A specialist agent lives inside its own world — its own vocabulary, its own subgraph, its own state schema. The planner cannot know any of that. The planner only knows the outer ConversationState. Something must translate between the outer clipboard and the agent's inner one. That something is the dispatch.

One of the principles I have kept coming back to, in design notes, is this:

“Dispatch and node are a unit, like you and I. One dispatcher for everything would be too inflexible and too imprecise.”

Every department ships its own dispatch. The notes agent has a dispatch.py that unpacks the parts of ConversationState it needs — the current intent, the user's emotion, the relevant memory fragments — into a department-local AgentState, runs the agent, and folds the AgentResult back into the outer clipboard. No central router touches it. Plugin-shaped by construction.

Listing 4 The notes agent, on disk directory layout
agents/notizen/
├── agent.py           # subgraph wiring (LangGraph nodes)
├── klassifikation.py  # classify    — recognise the action
├── suche.py           # resolve     — find the target entry
├── crud.py            # crud        — create/read/update/delete
├── resume.py          # resume      — continue after a clarification
├── bestaetigung.py    # confirm     — phrase the result for the responder
├── dispatch.py        # ConversationState ↔ AgentState
├── init.sql           # schema (bi-temporal, soft-delete indexed)
├── __init__.py        # package marker for auto-discovery
└── AGENT.md           # capabilities, triggers, tests
The files are named in German because Novaberg's domain language follows the user's language. An English-language deployment would name them classify / resolve / crud / resume / confirm. The files are the point, not the names.

Adding a new agent means making a new directory with its own dispatch.py and its own init.sql. No central router code is touched. Auto-discovery finds the dispatch the same way it finds the agent. The schema lives with the agent, not in a monolithic migrations file — because the department that owns the behaviour also owns the storage.

The normalisation layer — where speech becomes structure

The dispatch resolves the boundary between the outer clipboard and the inner one. But there is a deeper boundary inside the agent itself: the boundary between what the user said and what the specialist needs to hear. Crossing it cleanly is what turns a domain agent from a clever prompt into a reliable piece of software.

A patient walks into a clinic. He tells the receptionist: "Something feels off, my chest gets tight when I climb stairs, and I've been dizzy all week." The receptionist does not diagnose. She routes — cardiology, not dermatology. That is the Router. In the cardiology intake, a nurse takes the patient's words and fills out an admission form: dyspnoea on exertion, vertigo, seven days, no prior cardiac history. That is the Classify node. It translates natural speech into the department's professional vocabulary.

The cardiologist reads the admission form. He does not ask the patient to tell the story again. He reads structured observations in his own language and decides what comes next. That is the CRUD. The patient's original words — the worry, the phrasing, the tone — are not lost. They sit in their own fields on the clipboard, read by the responder when it is time to answer the patient in the patient's language. But the specialist never sees them. The specialist works on validated, domain-specific data. Two layers inside the same state object, consumed by different nodes.

Domain language normalisation

Novaberg calls this domain language normalisation. Every department declares its own professional vocabulary — the actions it recognises, the entities it operates on, the shape it expects. The Classify node translates the user's natural sentence into that vocabulary, inline, as part of its existing LLM call. No extra round-trip. No separate NLU pipeline. The output is a single structured field, normalised, that the downstream roles can rely on.

Listing 5 From speech to structure one turn · two layers
User says:      "Throw the bananas off the list."

Classify:       action      = remove_content
                target      = "Shopping list"
                normalised  = "remove_content: remove bananas
                               from note 'Shopping list'"

CRUD reads:     action, target, normalised
Responder:      reads user_input — preserves tone in reply
The CRUD never parses the sentence. It receives a named action and a located target. The Responder reads the original words — different consumers, different fields, same clipboard.

The same principle from the other side. An architect tells a client, "We'll use oak beams, stained in a warm grey." The client nods. But the structural engineer who picks up the file does not read "warm grey." He reads: load-bearing capacity 14 kN/m, span 4.2 m, cross-section 120 × 240 mm. Architect and engineer share a file. They do not share a vocabulary. The file carries both layers — the client-facing description and the engineering specification — and each reader takes what belongs to them.

Five roles, because separation demands it

This is why a department has five specialised roles — not because the domain is complicated, but because each role needs to be relieved of the work that does not belong to it. A single agent asked to do all five in one prompt collapses under context load. Five roles hold five small contexts, each individually testable.

Role one — Classify

Translates. It bridges the gap between natural speech and a named action. It fills out the department's intake form in the department's professional language. Everything downstream operates on what Classify produced — not on what the user actually said.

Role two — Resolve

Finds. It locates the right entity — the right list, the right appointment, the right note — using fuzzy search with score-gap disambiguation and an embedding fallback. It never interprets intent. If the match is ambiguous, it flags a clarification rather than guessing.

Role three — CRUD

Executes. Exactly the operation that Classify named, on the entity that Resolve located. CRUD never reads the user's original words. Its input is entirely the structured fields written by its upstream peers. This is what makes the four-phase hardening — recognise, validate, execute, verify — possible at all.

Role four — Resume

Re-enters. When a clarification was needed last turn — "which list, household or office?" — Resume picks up the pending state on the next user input and restarts the subgraph from the point where it paused. The pause is not an error path; it is a first-class state the clipboard can hold.

Role five — Confirm

Phrases the outcome in domain-appropriate language for the Responder to weave into the reply. Confirm is the only role that produces natural language inside the agent, and it produces one sentence, for one consumer. The Responder still owns the user-facing voice; Confirm just hands it a clean fact.

Why this scales

Domain language is what makes a new department a small change instead of a large one. Adding a department means declaring a new vocabulary — the actions, the entities, a handful of translation examples — and dropping a new directory next to the others. No central code is touched. No coordinator has to learn the new domain. The department arrives with its own intake form, its own specialists, its own dispatch. The clipboard carries whatever they write.

So the clipboard is not just a data-transport mechanism. It is the translation boundary between human speech and machine-readable action — and it keeps both representations alive, side by side, for the different readers that need them.

§ 4
A real example, end to end

"Put butter on the shopping list." What happens next.

Metaphors and schemas only go so far. Here is what actually runs in Novaberg when a user types one sentence. Novaberg's event model splits a conversational turn across two graphs: Path 1 — the HumanGraph — perceives the user's turn and persists it; Path 2 — the CharacterGraph — generates the reply. They are decoupled by a Redis event queue. The user gets an immediate 202 Accepted after Path 1 and the reply arrives later over a WebSocket. Latency is unchanged; the graphs are free to be specialised.

Figure 1 Two graphs, one case file
Path 1 · HumanGraph · synchronous 1. Perception intent, emotion 2. Enricher KZG · LZG · hash 3. EI-Calc user stream 4. Salience pending writes 5. Dispatcher turn + KZG Redis event queue Path 2 · CharacterGraph · asynchronous Enricher KZG · LZG · hash EI-Calc character stream Router update · notizen Planner agent_name="notizen" Dispatch · NotizenAgent classify → resolve → crud → confirm appends to agent_results Planner agent_name="" Responder formats reply User reply iterate until agent_name == ""
Fig. 1 — The clipboard (ConversationState) moves left to right through every node. No arrow carries a string between peers; every arrow is a state transition. The dashed rail between paths is a Redis event; the two paths share a session but not a graph. Quality-control nodes (Thinker, Tribunal, Corrector) and the character's self-perception are omitted for clarity.

Path 1 — HumanGraph

Perception extracts intent="task", topic="shopping", emotion="neutral". The Enricher loads the session, short-term memory, long-term memory, and the character hash. EI-Calc runs on the user's stream: the user's emotion trajectory, the current affect vector, a plausibility check on the communication mode. Salience decides that "butter / shopping list" is worth remembering and adds a pending write for the short-term store. The Dispatcher persists the turn and emits an event onto the Redis queue. No agent call here. Path 1 perceives and stores. Done.

Path 2 — CharacterGraph

The character's graph picks up the event. Its own Enricher loads what it needs. The Router sees management_action="update" and flags the notes department. The Planner sets agent_name="notizen". Agent Dispatch translates the outer clipboard into the notes agent's AgentState, runs the subgraph — classify recognises an append action, resolve finds the shopping list by name using fuzzy matching with an embedding fallback, CRUD appends "butter," confirm phrases the result — and folds the AgentResult back into agent_results.

The Planner now sees a fresh result and decides there is nothing more to do: agent_name="". The Responder reads agent_results[0].ergebnis and writes: "Done — butter is on the shopping list." At every point between Path 1 and the user seeing that sentence, the state was typed, readable, and pausable.

The point

If you print the state after every node, you get a complete causal trace of the turn — not a transcript of what a model said about what another model said about what the user said. The clipboard is the conversation. The user-facing sentence is a side-effect.

§ 5
Three levels: roles, departments, graphs

A taxonomy Novaberg converges toward — not an interface it enforces.

Scale is where design decisions either compound or decay. If every new capability requires touching a central file, a system grows less maintainable per capability added. The Clipboard Pattern scales — in Novaberg at least — because the departments organise themselves around three recurring levels, not because any framework demands it. I want to be careful here. The three levels are a pattern the code has settled into, not a contract the runtime verifies.

Level 1 — Roles

Roles are recurring functional patterns that most workflow agents share. There are five of them, and they have converged unforced. Classify decides what kind of action is being requested. Resolve finds the target entity — the right shopping list, the right appointment, the right note. CRUD performs the data operation. Resume handles the case where the agent is being re-entered after a clarification turn ("which list? the household one or the office one?"). Confirm phrases the outcome in domain-appropriate language for the responder to weave into the reply.

Each role lives in its own file. The shape is a convention, not a subclass of anything. Workflow agents share the convention; agents whose work is shaped differently deviate, on purpose. The characterisation agents (the ones that update Nova's self-model) do not have a Resolve step; they have a different skeleton. That is fine. The taxonomy is descriptive, not prescriptive.

Level 2 — Departments

A department is a composition of roles, specialised for a domain. The notes department (notizen), the timeline department, the character-identity department — all share the five-role skeleton, each instantiating it with its own vocabulary, its own storage, its own domain semantics. In Novaberg's current codebase there are eleven agent directories, each with its own dispatch.

Level 3 — Graphs

A graph is a cognitive unit. It orchestrates departments into meaningful workflows. Novaberg runs three compiled graphs: the HumanGraph (perceives and stores the user's turn), the CharacterGraph (decides and responds), and a smaller AgentGraph used by the background worker. Each graph has its own state schema — though Conversation­State is shared between HumanGraph and CharacterGraph, carrying the same session forward.

Why the taxonomy is descriptive

I want to resist the temptation to turn this into a formal framework. The value is not in a Role base class or a Department registry. The value is that the repetition becomes visible. When you know a new agent will likely want classify/resolve/crud/ resume/confirm, you start with a template and override where the domain demands it. When you know a graph routes work by means of Planner → Dispatch → Planner → Responder, you recognise the shape immediately in a new codebase. The taxonomy is a reading aid first and a factoring hint second. It is not a type system.

A reader who takes nothing else from this article should take this: the Clipboard Pattern is the primitive. The three levels are what the code looks like after you apply the primitive for long enough.

§ 6
What you gain

Determinism, testability, token discipline.

Adopting the Clipboard Pattern changes what you can say about your system's behaviour. Not in abstractions — in concrete claims that either hold or do not.

Determinism, where you want it

A node function is State → State. If it is not stochastic on purpose — most routing, planning, dispatch nodes are not — then it is deterministic. Feed it the same clipboard twice, get the same clipboard twice. You cannot say that about an agent-to-agent conversation; model temperature alone makes the same prompt produce a different paragraph on the second run.

Unit tests that actually pin behaviour

A node's test looks like: build a fixture state, invoke the node, assert on named fields of the output. Planner gets agent_results[0].status == "abgeschlossen" and should exit — does it? Dispatch receives agent_name="notizen" and should call the notes dispatch — does it? These are real, fast, robust tests. An agent-to-agent "assertion" that the output "mentions termination" is not.

Token discipline, for free

Fields are fields. A confidence: 0.83 is a number, not a paragraph. Natural language appears exactly twice in the pipeline: once at the input (the user's turn) and once at the output (the responder's reply). Every stage in between operates on structured data. Token usage follows directly: one perception call, one response call, plus whatever the agent's own internal work requires. Contrast with a four-agent messaging system where every hop is a full round-trip.

Audit trails, by construction

LangGraph persists state snapshots between nodes. A production incident becomes a replay: load the snapshot for graph step 7, run the node in a debugger, watch the decision happen. The trail is not a transcript; it is the clipboard at each desk.

§ 7
What you give up (and why that's fine)

Emergent agent conversation is not what you think it is.

The most common objection I hear to the Clipboard Pattern is that it sacrifices emergent behaviour — the thing where two agents surprise you by producing a better answer through their exchange than either would alone. There is a grain of truth in this. For the kind of workflow I have described — a user turn that wants a grounded, correct, persistent response — emergent behaviour is a liability, not an asset. You do not want the compliance officer and the legal reviewer to improvise. You want them to read the file.

Where free agent conversation genuinely helps — brainstorming, debate, negotiation, roleplay — the Clipboard Pattern is the wrong tool. Build those with a messaging substrate and accept the token cost. But most systems advertised as "multi-agent" are pipelines. Calling a pipeline a conversation does not make it one. It just makes it expensive.

§ 8
An invitation

Read the code. Disagree in the issues.

The clipboard pattern is not mine. It lives in any LangGraph codebase that takes state seriously, in any Rails or Django controller that keeps logic out of HTTP bodies, and — probably — in every law firm you have ever visited.

What I have done is give it a name, codify the discipline, and show what happens when you commit to it all the way down. Novaberg implements the pattern end-to-end. The source lives at codeberg.org/ClausVomBerg/Novaberg under Apache 2.0. If this argument has any force, you should be able to point at the code and find a line where it breaks. If you can, I would very much like to hear about it in the issues.

Next in this series: why an AI assistant needs its own emotions, and what goes wrong when it only mirrors yours.

Related work

The Clipboard Pattern sits at the intersection of several traditions. LangGraph itself is the runtime that makes it practical in Python. Elixir/OTP's GenServer messaging inspired the structured-payload discipline (via process isolation rather than shared state, but the principle is kin). React's unidirectional data flow is the closest analogue outside the LLM world.

The Amazon practitioner overview I referenced — Marina Wyss, "AI Agents: Complete Course" (Medium / Data Science Collective, December 2025) — is the most careful defence of the messaging paradigm I know, and worth reading even if you ultimately disagree. Her Complexity/Precision matrix for task selection is orthogonal to this argument and compatible with either paradigm. Her Memory taxonomy — dynamic and static — is a useful introduction but thinner than the five-layer memory system in Novaberg; that's material for a later paper.

Graphiti / Zep (arXiv 2501.13956) argues adjacent: structured memory beats text-based conversational history. langgraph-supervisor-py, and the pattern in JoshuaC215/agent-service-toolkit and cgoncalves94/multi_agent_system, move in the same direction as this paper without naming the pattern as such. If the name sticks, I hope it helps those efforts cohere.

Anticipated objections

ObjectionResponse
Text messages are more flexible. Yes — and more flexible, in production, is less reliable. A typed state can always carry a free notes field when flexibility is truly needed.
Amazon practitioners use the messaging pattern. They do, and they describe the "Communication Pitfalls" as a chapter in their own right. The problem is recognised; the remedy stays inside the paradigm. This paper proposes a paradigm shift, not a patch.
The state dict will grow enormous. Discipline required. Novaberg's ConversationState has over sixty fields, code-reviewed, versioned, readable. It has not exploded in over sixty development sessions.
This is just shared memory. No. Typed. Versioned by LangGraph's immutable reducers. Orchestrated through a graph. No concurrent thread access — nodes run in a deterministic sequence.
What about emergent behaviour? Covered in § 7. Pick the right tool. Most "multi-agent" systems are pipelines wearing the word conversation.
What if a node errors? Every AgentResult carries a status field — abgeschlossen, fehler, rueckfrage. The responder sees the error and speaks to it. Nodes that raise are caught, logged, and routed to a degraded-path.
What about streaming? LangGraph + FastAPI's SSE support lets every node emit events. The state is not compromised by streaming the reply; streaming is a delivery detail.
Palette