Slop and the Hidden Cost of AI Feature Generation

Written with Claude

Every AI-assisted codebase I’ve seen ships more and cleans up less. Companies are token-maxing, founders are posting GitHub activity charts on social media, and teams are measuring velocity in merged PRs. The velocity is up there, but so is the slop.

Slop, as used here, is the accumulating mass of AI-generated code that no human or agent fully owns: duplicated utilities, parallel implementations of the same logic, undocumented modules. It compounds three costs: every reader, human or agent, takes longer to trace logic through duplicated AI-generated modules, so onboarding stretches from days into weeks; feature delivery slows because each change requires navigating code nobody fully understands; infrastructure costs rise with the codebase because more code means more compute and more surface area for incidents.

Slop: duplicated, unowned code accumulating in the codebase.

These effects were measured in GitClear’s 2025 analysis of 211 million lines of code, which found that cloned lines rose from 8.3% to 12.3% — the report’s headline frames this as “4x more code cloning,” which describes the growth rate of the cloned-code category, not a fourfold increase in total code size — and refactoring declined from 25% of changed lines in 2021 to under 10% in 2024.

Build costs are falling while maintenance costs accumulate invisibly, and GitClear’s data show the mechanism accelerating. So how do we fix this? It starts with understanding why code changes in the first place.

Reasons to change code

M. Feathers identifies four primary reasons to change code in Working Effectively with Legacy Code. A healthy codebase exercises all four:

Adding a feature
Fixing a bug
Improving the design
Improving the performance

These map directly to commit prefixes — see Conventional commits and reasons for code change for a precise treatment of how feat, fix, and chore anchor to these four categories.

The third reason — improving the design — now has a wider readership. Under AI-assisted development, a codebase is read by humans and by every agent session that touches it. The same discipline that makes prose readable makes code legible to a language model. AGENTS.md files, project glossaries, and skill definitions are not documentation overhead; they are the design work that pays the largest dividend under agentic development.

How agentic coding skews toward new features

AI tools speed up routine work. The problem is the disproportionate skew toward new features at the expense of the other three reasons to change code. That skew has a measurable shape, and it compounds.

1. Feature PRs climbed, design work did not

AI changed the rate of one category. Faros AI telemetry across 10,000 developers and 1,255 teams shows that teams with high AI adoption merge 98% more PRs, but those PRs are 154% larger and take 91% longer to review. Sonar’s State of Code Developer Survey report 2026 (survey conducted October 2025) finds that 42% of committed code is AI-generated or assisted.

Feature output climbs while design and refactoring stay flat.

Faros measured correlation. High-output teams may have adopted AI earlier, so the relationship runs both ways: AI assistance might have produced the imbalance, or the imbalance was already there and AI amplified it. Either way, feature work expanded while design work did not keep pace. The numerator grows, the denominator stays flat, and the ratio worsens every sprint. Producing code without balancing the four reasons to change code causes slop in the sense defined above.

2. Nobody prompts for refactoring

AI optimizes for features because that is what people prompt. “Add a login page.” “Build the payment flow.” Nobody prompts “refactor auth so the next three features cost less.” Non-technical stakeholders now ship features directly through Cursor and Copilot. They prompt for features because features are all they know to ask for.

The other three reasons to change code require understanding the system, not generating text. They require judgment about what the code should become, not what it should do right now. Each AI-generated PR adds code a human must later read, maintain, test, and refactor.

A self-improving development system

Developers are drifting toward maintenance work, while product managers, designers, and other non-coding stakeholders ship features directly. The conditions produce that outcome. Telling people to “be more careful” will not reverse it. When the tool optimizes for output and the incentives reward output, you get more output. Slop follows.

A self-improving development system compounds the codebase’s maintainability session over session, not just its feature count. Both tooling and incentives matter. The point of the tooling layer is not to override stakeholder demand. It is to keep bad patterns from compounding once generated code lands: every duplicated utility, every undocumented module, every drift from project conventions caught and resolved before the next session inherits it. Without that, every feature increment makes the next one harder.

Three elements make this work:

Persistent context so the agent knows what abstractions exist before creating new ones
Guardrails so generated code follows the project’s conventions, not just the prompt’s intent
Graduation loops so patterns discovered in sessions migrate into the system and apply to every future session

Guardrails hold generated code to project conventions.

Consider a concrete sequence. A team ships a payment feature with AI. The agent generates a new date-formatting function. A quality skill runs automatically, detects that formatDate() already exists in utils/, and flags the duplication. The developer accepts the suggestion. In the next session, the agent’s persistent context includes the project’s utility map; it reuses formatDate() without being told. In the third session, “check existing utilities before generating new ones” is encoded as part of a project skill that runs on every change. The system improved from the first mistake. No developer had to remember a rule.

Karpathy’s autoresearch project runs a similar loop: an agent modifies training code, runs a 5-minute experiment, checks the result, keeps or discards, repeats. The stated goal is “to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement.” You no longer touch the code directly; you shape the instructions that guide the agent. The improvement loop runs the same from session to session — only the depth of your involvement changes.

Each session's discoveries graduate into rules for the next.

The DORA 2025 research converges on a single finding: AI amplifies what is already there. The AI Capabilities Model report identifies seven capabilities that magnify AI’s impact, and all seven are organizational — platforms, data ecosystems, engineering disciplines. The main DORA 2025 report frames AI as an amplifier: it magnifies existing strengths and weaknesses alike. Strong practices and weak ones both compound, only in opposite directions. The question is the one Feathers poses for legacy code: which practices make each change cheaper than the last? AI raises the stakes on that answer, and it also hands us better tools to act on it.

How I apply these

The persistent context lives in an Obsidian vault rather than in chat history, which evaporates when a session ends. Each project carries a stable context.md: its purpose, conventions, and the abstractions that already exist. A small Rust CLI, vault-query, pulls that file into a session before the agent writes any code. The agent reads what already exists before adding to it.

Some changes are too big for one session and run across several sessions and branches. A track file is the working memory for that span: one file per effort, holding its direction, the decisions already settled, and the work still pending. The first session sets the direction, explores what the effort needs, writes that into the track, and clears the context window. The next session reads the track back and resumes from that recorded state.

That next session runs work on the track: it splits the pending items across fresh subagents, one slice each. The orchestrator’s context stays lean, because a bloated context degrades the model. Read steps explore in parallel. Write steps commit one at a time on a branch, with the same feat/fix/chore prefixes that anchor the four reasons to change code.

Graduation already happens when a track is saved. The skill reviews the session and flags what should outlast it. It can be a memory update, maybe a new skill or even a linter rule. The reverse direction is looser: new work rarely pulls those past decisions back in, so each effort still leans mostly on its own track. The next improvement is to close that gap, so every effort starts from every earlier effort’s decisions.

The full setup is on the skills page: every skill, copy-paste ready, with a changelog as it evolves. Browse the skills →

Glossary

Slop: The accumulating mass of AI-generated code that no human or agent fully owns: duplicated utilities, parallel implementations of the same logic, undocumented modules that compound maintenance cost over time.
Tech debt: Accumulated structural compromises in a codebase that raise the cost of future change. Slop is one mechanism by which tech debt grows under AI-assisted development.
Agentic coding: Development work delegated to AI agents that read the codebase and edit files autonomously across multiple steps within a session, rather than AI used only for inline suggestions.
Self-improving development system: A development setup where each session's discoveries compound into the next session's defaults, so the same bad pattern does not have to be caught twice.
Persistent context: Information the agent retains about the project across sessions, including which abstractions and utilities already exist.
Guardrails: Automated checks that constrain generated code to project conventions and standards. Often implemented as linters, pre-commit hooks, CI rules, or agent hooks; can also live in skills.
Graduation loops: The process by which a fix discovered in one session becomes a permanent rule that applies to every future session.
Skill: A reusable instruction set an agent loads on demand to perform a defined task (review changed code, check existing utilities, encode a project convention). It turns a one-time prompt into a durable project asset that runs whenever the task recurs.

gif: I have many skills

Something to read next: Finding Joy in AI-Assisted Coding