The Laws Don't Break, They Shift: Engineering Teams in the Age of AI Agents
In the previous post, we mapped the core laws governing engineering teams: Conway, Brooks, Parkinson, Dunbar, Goodhart, and Ringelmann. Each one describes a force that operates on communication structures, coordination overhead, and cognitive limits.
Now introduce AI agents — systems that write code, execute tasks, and produce artefacts autonomously, at a pace no human engineer matches. Do the laws still hold?
Yes. But they don't hold in the same places.
The laws are about communication and coordination structures. Not about humans specifically. Adding AI agents changes the topology of the communication graph — new nodes, new edges, different bandwidth characteristics. The physics that governs that graph doesn't change. What changes is where the bottlenecks form.
The Team Model
The model throughout this post: a stream-aligned engineering team where AI agents are first-class members — not external tools or a separate system, but nodes in the team's communication graph. Each agent belongs to a team, is scoped to that team's domain, receives tasks from the same backlog, and produces output that enters the same review and integration pipeline as any human-authored code. The humans on the team own the specification layer and the integration gate. The agents own execution within defined boundaries.
Stream-aligned team with AI agents as first-class members — click to expand
Three things to note in this model. First, the agent's scope boundary is an architectural decision — it enforces the Conway constraint that prevents unscoped agents from producing coupling across domain lines. Second, both humans and agents produce PRs; the review layer doesn't distinguish by author, only by correctness and design fit. Third, ownership is assigned at the PR level by a named human before merge — not at the agent level, not at the team level, but per artefact.
Conway's Law — The Scope You Give an Agent Is the Architecture You Get
Conway said the architecture mirrors the communication structure of the organisation. With human engineers, the communication structure is defined by team boundaries, reporting lines, and who talks to whom. With AI agents, it extends to include a new variable: what each agent is scoped to see and modify.
An agent with access to the full codebase and no domain constraints will produce cross-cutting changes. It will couple modules that should be separate, because nothing in its context tells it not to. Conway's Law doesn't care whether the coupling was introduced by a human on the wrong team or by an agent with no team at all. The coupling is in the code either way.
Conversely, an agent scoped to a single domain — given only the relevant context, directed to only modify within a defined boundary, constrained to produce artefacts that conform to that domain's interfaces — will reinforce architectural boundaries rather than erode them.
The Inverse Conway Maneuver now has a second lever. The original maneuver: restructure the teams to produce the architecture you want. The extended version: design your agent topology to reinforce it. Agent scope is architectural intent made executable. If you want autonomous services with clean interfaces, each agent that touches a service should be scoped to that service. It should not know about, and should not modify, adjacent services without an explicit integration layer.
Gall's Law constrains the second lever the same way it constrains the first. In the previous post, we noted you can't big-bang a team reorg — you evolve through working intermediate states. The same applies to agent topology. You won't scope every agent perfectly on day one. Start with coarse domain boundaries, observe where coupling leaks through, and tighten scope incrementally. The target is a precisely scoped agent topology; the path runs through working approximations.
Conway's Law extended: agent scope as architectural enforcement — click to expand
Recent research on multi-agent pipelines reinforces how directly agent topology shapes outcomes. A study testing whether an expensive AI model can effectively direct a cheap one found that a strong manager directing a weak worker matched a strong single agent's performance — organisational structure substituted for raw capability. But a weak manager directing a weak worker performed worse than the weak agent alone. Structure without substance is pure overhead [1]. The Inverse Conway Maneuver for agents has a corollary: scoping agents incorrectly doesn't just fail to help — it actively degrades.
One additional nuance for agent communication topology: it's not just about who talks to whom, but about what format flows between nodes. MetaGPT's structured operating procedures — where agents exchange documents and diagrams rather than freeform dialogue — achieved 85.9% Pass@1 in code generation, significantly outperforming dialogue-based approaches [2]. Agent communication channels aren't social channels. They're document exchange protocols. Structured handoffs beat conversational back-and-forth by a wide margin.
The agent topology is the communication structure. Conway applies.
Brooks's Law — The Bottleneck Moved, It Didn't Disappear
Brooks's Law describes the quadratic growth of coordination overhead as team size grows. The mechanism: onboarding time plus new communication channels. Every new human engineer creates n new channels with existing team members. (See the coordination overhead curve in the previous post for the visual on why channels makes teams unworkable past ~10 people.)
AI agents largely break the first part of this. An agent doesn't need weeks of onboarding. You can give it context, scope it to a task, and it produces output the same day. The channel formula assumed human social coordination — shared understanding, trust-building, miscommunication and repair. Agents don't do any of that.
So Brooks is repealed?
No. The bottleneck moved.
Brooks's Law didn't disappear — it migrated from onboarding to review — click to expand
The empirical evidence is already clear. CooperBench — the first benchmark for multi-agent coding collaboration, testing 652 tasks across Python, TypeScript, Go, and Rust — found that agents achieve roughly 50% lower success rates when collaborating compared to working solo. GPT-5 and Claude Sonnet 4.5 hit only ~25% success in two-agent cooperation. Scaling from two to four agents made performance monotonically worse, not better. The failure taxonomy is telling: expectation failures (agents failing to model what their partner is doing) accounted for 42% of breakdowns, commitment failures for 32%, and communication breakdown for 26%. Agents dedicated up to 20% of their action budget to communication but lacked the pragmatic language understanding to make it productive [3].
A controlled study of multi-agent scaling across 260 configurations confirmed the mechanism: coordination overhead scales as a power law — — super-linear, not merely quadratic. Independent multi-agent setups showed 17.2x error amplification compared to single agents. There is a hard resource ceiling beyond 3–4 agents where per-agent reasoning capacity becomes too thin to be useful [4]. The same study found a capability-saturation effect: tasks where single-agent performance already exceeds ~45% see negative or diminishing returns from adding more agents. Brooks's quadratic intuition holds — and the actual cost curve may be worse.
When agents produce output faster than humans can review and integrate it, the review layer becomes the new coordination overhead. And unlike human PRs — which are usually small, contextually explained, and authored by someone you can ask a question — agent PRs can be voluminous, structurally sound, and semantically wrong in ways that require genuine comprehension to detect. The cost of a false positive (approving a change that looks right but isn't) is higher than it was with human authors, because the agent had no implicit context about why the existing code was the way it was.
Brooks's Law for the agent era:
Adding agents to a late project makes it later if you haven't designed the review and integration infrastructure to absorb their output.
The coordination tax didn't disappear. It shifted from inter-engineer to human-agent interface: specification writing, output review, conflict resolution when two agents modify adjacent code, integration of competing outputs that weren't designed to coexist. The quadratic still lurks — it just operates over agents-and-reviewers, not engineers-and-engineers.
Parkinson's Law — Production Shrank. Specification and Review Expanded.
Parkinson observed that work expands to fill available time. The classic form: give a sprint to a team of engineers, the sprint fills with work at whatever resolution the team operates at.
Agents are not subject to Parkinson's Law in its original form. They don't socially satisfice. They don't pace themselves to the deadline. An agent given a well-specified task produces output as fast as its context window allows.
So the production time window shrinks. Does Parkinson disappear?
No. The work that expands changes character.
When production is fast, the constraint becomes specification quality and review capacity. The time that was spent building is now available. Parkinson says it will fill. It fills with:
- Extended planning and refinement cycles ("we have time to think it through more carefully")
- Increasingly elaborate specifications that still underdetermine the behaviour
- Review queues that grow because output volume outpaced review throughput
- Scope that quietly inflates because the agent can "probably handle it"
Parkinson's expansion relocates from production to specification & review — click to expand
The danger is subtle. Teams that adopt agents often report that they "feel productive" — there's always output, always movement. But the Parkinson expansion has relocated to the specification and review layers, which are less visible and harder to measure than line count or PR count.
A randomised controlled trial by METR puts a number on this perception gap. Sixteen experienced open-source developers completed 246 tasks, randomly assigned to allow or disallow AI tools. When using AI, developers took 19% longer to complete their tasks — yet they believed AI had sped them up by 20%. Even after experiencing the slowdown, they still perceived a speedup [5]. This is Parkinson's expansion made measurable: the work felt faster because there was always output, always movement, always something the agent had produced. But the actual time went into wrestling with AI output, reconstructing context the agent didn't have, and reviewing changes that looked right but required genuine comprehension to validate. The production window shrank. The total delivery time grew.
The practical implication inverts. Tighter time constraints and fixed scope worked against the original Parkinson effect because it compressed the production window. In the agent era, the production window is already compressed. The discipline needed is at the input (specification precision, scope commitment before the agent starts) and the output (review throughput as a first-class engineering capacity, not an afterthought).
Dunbar's Number — The Cognitive Limit Shifted from Social to Comprehension
Dunbar's limits are grounded in the cognitive bandwidth required to maintain social relationships — tracking other people's context, state, and intentions. Engineering leaders can't maintain genuine working relationships with more than ~15 direct collaborators before coordination degrades. (The specific number ~150 for the outer social circle is contested by recent statistical analysis [6], but the existence of a cognitive ceiling on maintained relationships is not — and it's the ceiling, not the number, that matters here.)
AI agents don't require social bandwidth in the same way. You don't need to know an agent's career goals, manage its energy levels, or repair a trust rupture after a bad code review. The social layer is absent.
This is real capacity. An engineering leader directing agents can operate at a higher span than one managing only humans. The social ceiling rises.
But a different ceiling appears: comprehension bandwidth.
Dunbar's cognitive layers extended: social ceiling + comprehension ceiling — click to expand
When an agent produces a thousand lines of code, someone needs to understand what was built. Not just whether the tests pass — whether the design is right, whether it violates invariants the team has maintained, whether it creates technical debt that will compound. That comprehension is cognitively expensive and doesn't scale indefinitely.
The leader who believes they are "managing" twenty active agents across a codebase is often, in practice, approving diffs they can only partially comprehend. This is not a failure of effort — it's a cognitive limit of the same family as Dunbar, operating at a different layer. The system is being built, but its full state is no longer held in any human mind.
The number of agents you can effectively direct — not just nominally oversee — is bounded by your comprehension bandwidth. That ceiling is lower than it feels when the output looks good and the tests are green.
Cognitive Load Theory formalises why. A 2026 review introducing "bounded agent complementarity" demonstrates that both humans and LLMs are bounded processors with limited active workspaces — working memory holds roughly 3–5 items; a context window holds a finite token budget. Both show sharp nonlinear performance drops when workspace limits are breached, not gradual degradation. Both exhibit primacy–recency bias: overweighting the start and end of information, underweighting the middle [7]. The implication for agent-era teams is concrete: review quality doesn't degrade linearly as agent output increases. It holds, then collapses past a threshold — the same threshold pattern that Cognitive Load Theory predicts for any bounded processor under overload. The leader approving their tenth agent diff of the day is operating past the knee of a nonlinear curve, not at the midpoint of a linear one.
The Layer Shift
Each law was operating at a specific layer of the team stack. With agents in the system, the layers don't disappear — but the forces redistribute across them. Some laws hold the same position and gain a new one. Others vacate a layer entirely and reappear somewhere else.
Force allocation across team layers — before and after agents — click to expand
Three things stand out when you look at this as a whole:
Production goes gray. In the human-only stack, Production was the primary force field — Parkinson lived there, output velocity was the constraint. In the agent stack, Production is handled and largely uncontested. No major law operates there. The constraint has moved in both directions from it.
Review becomes the convergence point. Brooks's shifted coordination overhead and Parkinson's displaced expansion both land at Review & Integration simultaneously. This is the most loaded layer in the agent-era stack and the most likely to be under-resourced, because it looks like overhead rather than output.
Amdahl's Law explains why this convergence point bounds the entire system, not just the review layer itself. If roughly 20% of the software delivery lifecycle is individual production work (the exact fraction varies by team and product), and AI makes that fraction effectively instant, the maximum system-level speedup is ~1.25x — because the remaining 80% (reviews, decisions, alignment, approvals) still moves at human speed [8]. Even a 10x speedup on production yields only ~1.22x on the whole system. The review layer isn't just important. It's the Amdahl bottleneck that caps everything above and below it. Google's 2025 DORA report, surveying nearly 5,000 technology professionals, confirms this at organisational scale: AI adoption increases perceived individual effectiveness, but has no measurable effect on burnout or friction, and is associated with increased software delivery instability. Their central finding: "AI doesn't fix a team; it amplifies what's already there" [9]. The system around the tool determines whether the tool's output becomes value or inventory.
Conway extends downward rather than shifting. Every other law moved to a different layer. Conway didn't move — it extended. The org structure layer still operates the same way it always did. But below it, a new layer appeared that Conway governs equally: agent scope and context. The original Inverse Conway Maneuver (restructure teams → architecture follows) now has a second instrument: scope your agents → architecture is reinforced or eroded accordingly.
Goodhart intensifies. In the human-only stack, Goodhart operated at the metrics and production layer — teams gaming velocity, coverage, or deployment frequency. Agents don't game metrics socially; they optimise them mechanically and without friction. Goodhart extends from production into every layer where an agent has a measurable objective: agent scope (optimise the metric in my prompt), specification (produce output that satisfies the acceptance criteria literally), and review (approve quickly if speed is the metric). The Goodhart risk in agent-era teams isn't lower — it's faster-acting and less visible.
Ringelmann partially dissolves, partially shifts. Agents don't social-loaf, so the motivational penalty of large groups doesn't apply to the agent side of the team. But the human side may experience a new form of Ringelmann: diffusion of responsibility through perceived automation coverage rather than through group anonymity. The defence — named ownership per artefact — addresses both the original and the shifted form.
The New Dynamics
Four effects emerge with AI agents that don't map cleanly onto any of the four laws. They're new phenomena produced by the same underlying forces.
The Specification Amplifier
With human engineers, a vague requirement produces questions, pushback, and negotiation. The human's uncertainty becomes a signal that surfaces the ambiguity. With agents, a vague requirement produces confident, well-structured output that is wrong in ways that may not be obvious until integration or production.
The Specification Amplifier: vague inputs produce divergent confident outputs — click to expand
The cost of specification failure is amplified. An under-specified task given to three engineers produces three confused engineers. The same task given to three agents produces three divergent implementations, each internally consistent, none correct, all submitted as PRs — each passing CI, each ready for merge, each silently wrong in different ways.
Production failure analysis from multi-agent deployments confirms this is not theoretical. In a 2026 study of real-world multi-agent software development, specification problems accounted for 41.8% of all production failures — the single largest category, ahead of coordination failures (36.9%) and verification gaps (21.3%) [10]. Specification quality isn't just a load-bearing engineering discipline. It's the dominant failure mode in agent-era teams. This is not a new skill — it's an undervalued old one that AI agents have abruptly made critical.
The Metric Amplifier (Goodhart, Extended)
In the previous post, Goodhart's Law described how human teams game metrics: velocity inflates, coverage numbers rise without corresponding quality, deployment frequency becomes the product instead of user value. Humans game metrics socially — by negotiating what counts, by satisficing, by optimising just enough to look good.
Agents game metrics literally. An agent told to maximise test coverage will produce trivial assertions that exercise every branch without testing anything meaningful. An agent measured on PR throughput will produce many small, low-value changes. An agent given a deployment-frequency target will create deployments that deploy nothing of consequence. The Goodhart dynamic doesn't weaken with agents — it intensifies, because agents optimise exactly what they're told to optimise, without the social friction that sometimes causes a human to push back and say "this metric is stupid."
This connects directly to the principal-agent problem discussed below in The Accountability Topology: the agent's objective function is whatever proxy you encoded in its prompt and evaluation criteria. If that proxy diverges from the actual goal, the agent will relentlessly optimise the proxy. The specification amplifier produces wrong output from vague specs. The metric amplifier produces perverse output from misaligned metrics. Both failure modes are amplified because the agent lacks the human instinct to question whether the target is worth hitting.
The defence is the same one Goodhart demands in human teams — measure closer to the outcome you actually care about — but it's more urgent, because the feedback loop is faster and the gaming is more mechanical.
The Review Bottleneck (Brooks, Extended)
The Brooks section above established that the coordination bottleneck shifted from onboarding to review. Here are the mechanics of that shift — and a practical lever to raise the throughput ceiling.
If an agent can produce a PR in ten minutes, and a human engineer takes forty-five minutes to review it properly, you need at least four engineers reviewing for every agent producing to stay in flow. Most teams don't staff to this ratio. The PR queue grows. Work-in-progress accumulates. The project feels productive — look at the commit volume — while the actual integration point is jammed.
The obvious response: use an agent for review. And yes — this is a real lever, not a deflection. A review agent given proper domain context — the invariants the team has maintained, the interface contracts, the design decisions and why they were made — can perform a meaningful first pass: catching mechanical errors, flagging spec deviations, identifying coupling violations across the domain boundary. The human reviewer's job shifts from "understand and validate this diff in full" to "assess whether the reviewer's judgement is sound for this class of change." That's a lower-cognitive-load operation per PR. The throughput ceiling rises substantially.
But the constraint doesn't vanish. It shifts again.
The review agent needs the same investment as the producing agent: domain scoping, loaded context, a clear definition of what it's accountable for. An under-specified review agent produces confident-looking assessments that miss non-obvious violations — which is exactly the failure mode it was meant to prevent. And the human approving the review still owns what gets merged. That accountability requires enough comprehension to judge whether the reviewer's confidence can be trusted for this particular change.
The updated formulation for teams that deploy review agents:
The residual human bottleneck is no longer "review everything." It's "validate the reviewer's judgement on changes where automated review confidence is low."
That ceiling is meaningfully higher. But reaching it requires treating the review agent as a first-class team member with the same design rigour as any other — not as a shortcut that removes humans from the integration gate.
Agentic Drift
Conway's Law governs agent scope at the boundary level — scope the agent to a domain, and you reinforce architectural separation. But what happens within a domain when multiple agents work on adjacent code?
Agentic drift. When parallel agents work on related code without coordination, they gradually and invisibly diverge. A documented case: three agents solving different problems each independently implemented dynamic model discovery — with different class names, interfaces, and assumptions. The code compiled. The tests passed. But the codebase contained three conflicting implementations of the same concept [10].
Agentic drift: parallel agents produce locally consistent but globally incoherent code — click to expand
The root cause is that tasks are rarely truly independent. Software dependencies form networks where features sharing utilities inevitably intersect. Isolated agents make locally reasonable but globally incoherent decisions. This is the intra-domain analogue of Conway boundary violations — and it's invisible until integration, because each agent's output is internally consistent.
The Accountability Topology
With human engineers, code ownership follows team topology — Conway. When something breaks in the payments service, the payments team owns the investigation.
When an agent produces code, ownership doesn't follow automatically. If the team design doesn't explicitly answer "who owns what this agent produced," the answer defaults to "everyone," which means effectively no one. The accountability layer fragments in exactly the same way Conway predicts for unclear team interfaces — just at the level of authorship rather than architecture.
Economics has a precise term for this: the principal-agent problem. When tasks are delegated to AI agents, a principal-agent relationship is established — characterised by information asymmetry, moral hazard, and adverse selection. The agent that produces cross-cutting coupling or accumulates technical debt bears none of the maintenance cost. It has no skin in the game. Recent work in AI governance argues that the training pipeline itself produces these asymmetries: pretraining optimises for prediction over truth, and reinforcement learning from human feedback optimises for approval over correctness [11][12]. The accountability gap isn't a failure of tooling. It's a structural property of delegation to agents that don't bear consequences.
Every agent needs an owning team. That team's charter covers what the agent builds. This is not optional — it's the accountability contract that makes the system governable.
A Note on Ringelmann
The previous post identified the Ringelmann Effect as the motivational complement to Brooks: individual effort drops as group size grows, because diffusion of responsibility reduces personal accountability.
Agents don't social-loaf. They don't reduce effort when surrounded by other agents. On this axis, Ringelmann genuinely dissolves — an agent's per-task output is constant regardless of how many other agents are running in parallel.
But Ringelmann may intensify on the human side. When agents are visibly producing output, the human temptation to coast increases. "The agent will handle it" is the new "someone else will pick it up." The diffusion of responsibility shifts from team anonymity to delegation comfort. A team of three humans and five agents may find that each human contributes less effort per task than they would in a team of three humans and no agents — not because they're lazy, but because the perceived accountability for the agent's output is diffuse. This is Ringelmann operating through a new mechanism: not group size, but perceived automation coverage.
The Ringelmann defence is the same as the accountability defence: named ownership per artefact, before the agent starts.
The Revised Strategies
The eight strategies from the first post still apply. Here are six additional strategies specific to agent-era teams.
Six strategies for the agent era — grounded in the shifted laws — click to expand
1. Agent scope = architectural scope. When deploying agents, define their domain boundary before you define their task. The boundary is the architecture decision. The task is secondary.
2. Review capacity is infrastructure, not overhead. If agents are producing, review must scale with production. Treat review throughput as a team metric — not a trailing indicator of PR counts, but a leading constraint on how many agents you can responsibly run in parallel.
3. Spec before you prompt. The specification discipline that Parkinson would have applied to a human sprint now applies to the agent prompt and context. A vague prompt at T=0 produces compounding rework at T=n. Write the spec. Validate the scope. Then start the agent.
4. Cap comprehension, not just headcount. The team-size caps from Dunbar still apply to the human layer. Add an agent-count cap based on comprehension bandwidth — a realistic estimate of how many agent outputs your team can genuinely review and understand per sprint, not just approve.
5. Assign ownership before the agent runs. Every agent task needs a human owner before it starts. That person is accountable for what gets merged. The review is their signal, not a bureaucratic gate.
6. Design agent topology with the Inverse Conway Maneuver. If you want the architecture to have clean domain boundaries, enforce those boundaries in the agent configuration. Scope, context, and permissions are architecture decisions. Treat them as such.
The Meta-Shift
The laws haven't changed. What's changed is the abstraction layer at which they operate.
The meta-shift: same laws, different layers — click to expand
Conway used to operate at the human communication layer. It now operates at the human-agent interaction layer and the agent scoping layer. Brooks used to operate at the onboarding and inter-engineer coordination layer. It now operates at the specification and review layer. Parkinson used to operate at the production layer. It now operates at the specification and comprehension layers. Dunbar used to set the social ceiling. It now sets two ceilings — one social, one comprehension — and the second is less visible and less intuitive. Goodhart used to operate at the metrics layer through social gaming. It now operates at every layer where an agent has a measurable objective, and operates faster. Ringelmann used to operate at the motivational layer through group anonymity. For agents, it dissolves; for the humans alongside them, it may shift to diffusion of responsibility through perceived automation.
The engineering leader's job didn't get easier. The leverage points shifted up a level of abstraction. The leader who understands where the forces now operate will design teams and agent topologies accordingly. The one who doesn't will have fast-moving, well-committing projects that are structurally unsound, poorly reviewed, and owned by no one in particular.
The physics is the same. The board is different. Learn the new board.
References
-
Rui Liu et al., "Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations," arXiv:2603.26458, March 2026. Paper
-
Sirui Hong et al., "MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework," arXiv:2308.00352. Paper
-
Arpandeep Khatua et al. (Stanford / SAP Labs), "CooperBench: Why Coding Agents Cannot be Your Teammates Yet," arXiv:2601.13295, January 2026. Paper
-
Yubin Kim et al. (Google Research / DeepMind / MIT), "Towards a Science of Scaling Agent Systems," arXiv:2512.08296, December 2025. Paper
-
METR, "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developers," arXiv:2507.09089, July 2025; updated February 2026. Blog · Paper
-
Lindenfors et al., "'Dunbar's number' deconstructed," Biology Letters, Royal Society, 2021. Paper
-
"Overloaded minds and machines: a cognitive load framework for human-AI symbiosis," Artificial Intelligence Review, Springer, 2026. Paper
-
Atlassian, "How Amdahl's Law still applies to modern-day AI inefficiencies," 2025. Article
-
Google DORA, "2025 State of AI-Assisted Software Development," 2025. Report · PDF
-
Zylos Research, "Multi-Agent Software Development: AI-Native Engineering Teams in Practice," March 2026. Report
-
"No Skin in the Game: Why Agentic AI Requires Principal-Agent Governance," AI and Ethics, Springer, 2026. Paper
-
"Rethinking AI Agents: A Principal-Agent Perspective," California Management Review, Berkeley, 2025. Article