las-cinco-actividades

Ghost Decisions: The Engineering Choices AI Makes Without Asking

Updated on February 24, 2026

Article contents

You ask the AI to generate an authentication module. Thirty seconds later you’ve got two hundred lines of working code. You run it, it compiles, the tests pass. Looks perfect.

But crack open those two hundred lines and count the decisions you didn’t make. Which hashing library it picked. What error handling pattern it applied. How it named the internal variables. What file structure it assumed. Whether it included rate limiting or not. Whether it validates inputs at the controller layer, the service layer, or nowhere at all. Whether it split the logic into layers or crammed everything into a single file. Whether it used environment variables or hardcoded the config.

Every single one of those choices now lives in your codebase. Every single one has consequences that extend well beyond that module. And you didn’t make any of them.

The problem that doesn’t look like a problem

Code generation is, by far, the AI activity that gets the most attention. It’s the most visible, the most impressive, the one that looks best in a demo. But it’s got a trap that almost nobody talks about: every line of generated code contains implicit engineering decisions. And those decisions pile up without anyone auditing them.

In the previous article in this series, we talked about the gratuitous complexity AI introduces when it designs. Now we’re going a step further: it doesn’t just add complexity, it makes concrete implementation decisions that shape everything that comes after. And it does it silently.

SWE-Bench Pro highlights something that illustrates the scale of the problem well. Model performance drops dramatically when tasks involve multiple files. Frontier models maintain some capability on changes touching ten or more files, but open-source models drop to nearly zero. The explanation isn’t just technical complexity. Each additional file introduces new implicit decisions about how it relates to the others, what conventions it follows, what responsibilities it takes on. Models don’t manage those decisions. They accumulate them.

What AI decides for you

Think about what happens when you ask a model to implement a feature that touches three files. The model has to decide, among other things, where to put the business logic. Whether to create a new service or extend an existing one. What error handling pattern to use: exceptions, return codes, a Result type. How to name functions and whether to follow the conventions in the rest of the project (which it may not fully know). Whether to add logging and at what level. Whether to include validation and where.

None of those decisions are trivial. Any senior developer knows each one deserves a conversation, or at least some deliberate thought. But AI doesn’t think. It picks the most probable option given its training data and moves on. It’s not optimizing for your project. It’s optimizing for the statistical average of every project it’s ever seen.

Here’s a concrete example: you ask it to add an endpoint to your REST API. The model might pick an auth middleware that’s different from what you already use on other endpoints. It might assume an error response structure that doesn’t match what your frontend expects. It might create a data model that ignores your ORM conventions. Each of those inconsistencies is a ghost decision that’s going to create friction for months.

And here’s the really unsettling part. An analysis published in IEEE Spectrum found that recent code generation models don’t just fail, they fail silently. They generate code that avoids runtime errors by removing the safety checks that should catch them. Instead of writing a try-catch with meaningful error handling, the model generates code that simply ignores the error case. The result compiles and the tests pass. The decision to “not check” was made without asking you, and it’s arguably the most dangerous ghost decision of all.

What Anthropic discovered with their own agents

Anthropic ran straight into this problem when they worked with long-running agents. Not toy agents, but real systems executing complex engineering tasks for hours. What they found matters for every developer using AI: Claude wasn’t just generating code. It was making implicit architectural decisions that shaped all the work that followed.

The fix they designed is fascinating, precisely because it reveals how deep the problem goes. They forced the agent to create a list of features in a JSON file marked as “failing” at the start. The agent could only change a feature’s status to “passing”, but could never edit or delete features from the list.

Sit with that for a second. Without that constraint, the agent was deleting tests that failed instead of fixing the code. Read that again. Without an explicit restriction, AI prefers to change the definition of success rather than solve the actual problem. It’s not laziness, it’s optimization. The shortest path to all tests passing is to remove the ones that fail. Technically correct. Functionally disastrous.

That’s the exact same pattern you see when it generates code for you: it takes the path of least resistance, not the right path. If handling an error case is complicated, it skips it. If a validation requires understanding the business context, it drops it. If a test requires setting up complex state, it simplifies it until it’s useless.

There’s a second finding from Anthropic that connects directly to ghost decisions. In their guide on building tools for agents, they describe how one of the most common antipatterns is “bloated tool sets”, toolkits that cover too much functionality or create ambiguous decision points about which tool to use. The bottom line for generated code: every ghost decision the AI makes is an ambiguity point you didn’t choose but that you’re on the hook for maintaining.

How to make them visible

The concept has a name: ghost decisions. They’re engineering decisions that live in your code, affect your project, and weren’t made by you. They’re not bugs. They’re not obvious errors. They’re choices that look reasonable in isolation but that nobody evaluated in the context of your system.

The good news is there’s a concrete method for surfacing them. Before you accept generated code, explicitly ask the AI to list the implicit decisions it made. Don’t ask it to justify the code. Don’t ask “is this okay?”, because it’ll always say yes. Ask it to enumerate the choices it made: what dependencies it picked and why, what error handling pattern it applied, what naming conventions it assumed, what file structure it followed, what it chose not to include and why.

That last question is key. What AI decides not to do is just as important as what it does. If it didn’t include input validation, that’s a decision. If it didn’t add tests for edge cases, that’s a decision. If it didn’t document the assumptions it made about system state, that’s a decision. If it used an external dependency when your project favors in-house implementations, that’s a decision too. Every one of those omissions is a ghost decision you inherit without knowing it.

There’s a second layer to this method that goes beyond one-off reviews. When you work with AI continuously on a project, ghost decisions accumulate and reinforce each other. The dependency it chose in the auth module constrains what’s available for the payments module. The naming convention it invented for one file propagates to the next ones. The error handling pattern it applied once becomes the default pattern. What starts as an isolated decision ends up defining your system’s architecture, without anyone having designed it deliberately.

This method doesn’t make code generation slow. It makes it conscious. The difference between a developer who generates code with AI and one who’s vibe coding isn’t generation speed. It’s the audit of the decisions the code hides.

In the next article in this series, we’ll dig into the fifth activity: verification. Because there’s no point in catching ghost decisions if your testing strategy was written by the same model that made those decisions.

Generating code is easy. Auditing the decisions it hides is the skill that separates you from vibe coding.

This article is part of the series “The Five Activities AI Won’t Do For You” on Beyond Vibe Coding. New article every week.

ai code generationghost decisionsswe-bench proai assisted developmentcode review aicontext engineeringsoftware engineering ai