Harness Engineering is using a durable set of components to apply an LLM to an AI-ready Workspace.
Harness Engineering is the practice of managing the inputs, outputs and sensors surrounding a large language model using control theory to reach a desired outcome. The industry writing agrees than an Agent is the Model and the Harness (Agent = Model + Harness). Birgitta Böckeler of Thoughtworks defines a Harness as "everything but the model."
The paper expands the Agent+Model definition by describing an Agentic Composition that has three parts. The Model is the LLM in use. The Harness is instructions to the model. The Workspace is a project containing the assets being built.
Agentic Compositions enable the use of control theory. Feedforward controls steer the Model before it acts. Feedback sensors measure the output and drive corrective action. The controls/sensors can be deterministic and computational with code like tests and linters. Or controls/sensors can be stochastic and inferential like AI assisted reviews.
Agentic Compositions are infrastructure. It is easy to conflate them with workflows, agentic teams, or business intent. Engineers WILL consider those elements in designing a system. They are not in scope of the Agentic Composition definition.
The Harness concept is model-neutral in theory but may have provider-specific idioms in practice. For example, all model providers support some form of rules files. Claude Code uses CLAUDE.md and Codex uses AGENTS.md.
Agentic Compositions are
An Agentic Composition has three parts. The Model is the LLM in use. The Harness carries the knowledge, instructions, and tools that tell the Model how to work. The Workspace is a project folder containing the assets being built. The Harness subdivides into an Inner Harness and an Outer Harness. The Inner Harness is what the provider ships like system prompts and built-in tools. The Outer Harness is what the engineer builds like rules, skills, and MCPs. The Workspace subdivides into an Adapter and a Product. The Adapter holds the files that make the Workspace AI-ready. The Product is the work asset being created or maintained. Different Harnesses applied to the same Workspace focus the model on different aspects of the Product. Harnesses and Workspaces can be independently developed and iterated.
The Harness says "This is how to do work." The Workspace says "This is how to work with this asset."

It is helpful to look at a concrete example. The Harness example uses the author's daily-driver setup. The Workspace example uses an application written in go and hosted on kubernetes. The author uses Claude Code in two primary work modes (pairing and agentic jobs). This is an abbreviated example. The harnesses and adapters described here are hundreds of files, tools, and lines of instructions.
Harness
CLAUDE.md/docs and /review-deep skillsWorkspace (git repo)
CLAUDE.md.llmdocs with files like architecture.md, api.md, and data-model.md.claude/skills and .claude/rules.serena foldermise.toml (variable, tool, secret, and build definitions)README.md (for humans)src/api, src/ui, src/cli code foldersinfra foldertests folder with integration, smoke, and e2e testsThe following abbreviated directory trees show the layout on the filesystem. The two trees share similar files like CLAUDE.md but with different semantics. The Harness CLAUDE.md says to use the London school of TDD and red-green-refactor. The platform's Adapter CLAUDE.md says how to run the go test suites and when to run which suite.
Harness, the author's dotfiles for daily-driver work
/home/vscode
├── .claude
│ ├── agents
│ │ ├── review-quality.md
│ │ ├── review-security.md
│ │ └── review-testing.md
│ ├── CLAUDE.md
│ │── hooks
│ │ └── stop-phrase-guard.sh
│ │── rules
│ │ ├── bash.md
│ │ ├── go.md
│ │ ├── lsp-serena.md
│ │ ├── md-syntax.md
│ │ └── python.md
│ └── skills
│ ├── docs.md
│ ├── review-deep.md
│ └── review-quick.md
├── .prettierrc
└── .secrets
├── ai.env
├── context7.env
├── google.env
└── sonarqube.env
Workspace, the platform git repo
platform
├── .claude
│ ├── commands
│ ├── settings.local.json
│ └── skills
├── .llmdocs
│ ├── api.md
│ ├── architecture.md
│ ├── data-model.md
│ └── deployment.md
├── .llmtmp
│ ├── notes.md
│ ├── plans
│ └── specs
├── .mise.toml
├── .serena
├── CLAUDE.md
├── infra
│ ├── backend.tf
│ └── main.tf
├── README.md
├── secrets
│ └── secrets.enc.yaml
├── src
│ ├── app-api
│ ├── app-cli
│ └── app-ui
└── tests
├── e2e
├── fixtures
└── integration
The bounded contexts between Harness and Workspace are not as clean as they might look in the diagrams and examples. Take traditional tests for example (unit, e2e, etc.). Tests in the pipeline are a human construct to assure non-functional requirements (quality, security, performance). Tests from the Model's perspective are a feedback sensor AND part of the Product. Another example is the system prompt. A practitioner may override the Anthropic system prompt, stripping fluff like marketing ("You are the Claude Platform") and indemnifications ("If a user shows signs of an eating disorder[...]"). When left alone, the system prompt is an Inner Harness. When customized, it is an Outer Harness. There is some fluidity in the definitions and categories.
A Harness and Workspace combined forms an Agentic Composition. New Agentic Compositions can be formed by combining different Harnesses with different Workspaces. A Harness can be swapped across Workspaces. A Workspace can be operated by different Harnesses.
For this example, it is easiest to think of the Harness as a persona, something with a particular talent or focus area like coding, security, or documentation. This is subtly different than subagents. Subagents steer the Model to favor domain-specific words by shifting probabilities based on word relationships. A "security specialist" agent will have higher probabilities toward things like "OWASP" or "SQL Injection." A "software architect" agent will favor "Domain Driven Design" or "SOLID". Swapping a Harness does that too and then some. It can also swap system prompts, tools, configurations, identities, or skills. For example, a security Harness would have the security persona, threat modeling skills, and tools like SAST/DAST. A compliance Harness would have an auditor persona, skills for HIPAA, access to audit evidence, and the ability to write to a compliance register.

Control theory is a field of engineering concerned with developing systems that increase the stability and optimality of processes. Applied to LLMs, it means designing the Harness and Workspace to drive the Model toward a desired outcome (the set point). There are two types of control: open-loop control (feedforward) and closed-loop control (feedback). A Harness can use both simultaneously. Feedforward controls steer the Model before it acts. Feedback sensors measure the output, produce an error signal (the gap between desired and actual outcome), and drive corrective action. The system terminates when the error signal is acceptable or when a human intervention is required.
In more pragmatic terms, the Harness and Adapter tell the Model, "This is what you need to do your job" in the feedforward stage. Then Harness and Adapter tell the operator or the Model "This is how it turned out" in the feedback stage. If the outcome does not match the set point, the controller can choose to correct in the closed loop or escalate to a human for intervention in the openloop.
Consider a simple example of telling a model to create a new function in a Product.
The feedforward control says
The feedback control says
Credit Birgitta Böckeler, Thoughtworks
The platform runs agent workloads in Kubernetes. A workload container is the platform's compute unit. It takes three git repos as input and chains them at startup through a three-layer bootstrap. This is an example of an implementation of Agentic Composition. The platform does not provide workflow. The platform only provides Agent Composition, the infrastructure. Workflows could run on the platform using a workflow engine (LangChain, Dify) or autonomous agents (OpenClaw, OpenFang).
| Layer | Repo | Maps to |
|---|---|---|
| 1 | Bootstrap repo (public, encrypted) | Basic credentials like SSH keys, secret passphrases |
| 2 | Dotfiles repo (private) | Outer Harness: shell config, tool settings, MCP servers, agent skills, encrypted secrets |
| 3 | Workspace repo (private) | Workspace: Adapter and Product |
The agent tooling (Claude Code, OpenCode) and the provider (Anthropic, OpenAI) form the inner harness. The bootstrap and dotfiles repos together form the Outer Harness. Layer 1 delivers basic credentials. Layer 2 installs the dotfiles repo, which carries the persona's shell config, tool settings, agent rules, and secrets. Layer 3 clones the workspace repo into the container's working directory.
The platform calls this combination a persona. A platform persona carries infrastructure provisioning tools and cloud config. A data persona carries pipeline orchestration and data-quality rules. A security persona carries vulnerability scanners and threat modeling skills. Swapping the dotfiles repo URL in the container config swaps the Harness. Swapping the workspace repo URL retargets the agent at a different codebase. The Model runs inside the container.
Workload containers are ephemeral. Building a container from the same three inputs produces the same starting state.
The dotfiles repo and the workspace repo evolve independently. The dotfiles repo gains a new MCP server or a refined skill on its own schedule. The workspace repo gains new code, tests, or Adapter files on its own schedule. Neither repo knows about the other. The contract between them is shallow. That shallow contract is what makes them composable. A workload container brings them together at startup and gives them cohesion for the duration of the container's lifecycle.