There is a growing conviction in the software industry that building software with AI is essentially a solved problem. Or if it is not solved today, it will be within a year. The reasoning goes something like this: a dwindling minority of naysayers at large, slow-moving companies are pretending AI is still bad, while a rapidly growing majority is adopting AI tools daily and actively improving their setups. The good ideas are sticking. The bad ones are fading. Momentum is undeniable.
And the momentum part is true. It really is.
According to Stack Overflow’s 2025 Developer Survey, 84% of developers now use or plan to use AI tools in their development process, up from 76% the year before. Over half of professional developers use AI tools daily. GitHub Copilot alone has surpassed 20 million cumulative users. Reports suggest that around 42% of all committed code is now AI-assisted. By any adoption metric, this technology has crossed the threshold from experimental to standard practice.
But here is where the narrative cracks. While adoption has surged, developer trust has moved in the opposite direction. In 2023 and 2024, over 70% of developers had a favorable view of AI tools. By 2025, that number dropped to 60%. Nearly half of developers now say they do not trust the accuracy of AI output, and only 3% report high trust. Among experienced developers, the skepticism is even sharper, with the lowest trust and highest distrust rates of any cohort.
This is not a story about AI being bad. This is a story about the gap between “writes code fast” and “builds software that works.” And that gap is where the real engineering begins.
The 80/20 Trap
Every experienced engineer knows the feeling. You are 80% done with a project, and someone asks when it will ship. You say it is almost done. Then the last 20% takes as long as the first 80%, sometimes longer. The integration points are wrong. The edge cases multiply. The thing that worked in isolation falls apart in context.
AI-assisted software development is living in that last 20% right now. The first 80% has been genuinely impressive. AI can generate boilerplate at remarkable speed. It can scaffold components, write tests for well-defined functions, produce documentation, and autocomplete patterns that would take a developer minutes to type. For greenfield projects with clear specifications, it can feel almost magical.
But the problems that remain are not incremental improvements on the same trajectory. They are categorically different problems, the kind that resist scaling through more compute or better models alone.
This is important to understand. The narrative that AI coding is “almost solved” assumes we are on a smooth curve where each improvement brings us closer to completion. In reality, we hit a series of phase transitions where the nature of the challenge changes entirely. Writing code is one problem. Building software is a different one. Maintaining software is yet another. And coordinating multiple systems, teams, and priorities across time is another problem still.
Each of those transitions introduces complexity that does not yield to the same solutions that handled the previous one.
The Single-Thread Bottleneck
Right now, most AI-assisted development workflows are single-threaded. A developer works with one AI agent, in one context window, on one task at a time. This is fine for many scenarios, and the productivity gains in this mode are well-documented. Developers report saving 30 to 60% of time on coding, testing, and documentation tasks when working with AI tools.
The obvious next step is parallelism. If one AI agent can handle a task, why not deploy a swarm of agents to work on an entire feature set simultaneously? Multiple agents could architect, implement, test, and review code in parallel, collapsing weeks of work into hours.
This is not hypothetical. Adrian Cockcroft demonstrated a five-agent swarm that produced over 150,000 lines of code in 48 hours. Frameworks like CrewAI, AutoGen, and OpenAI’s Swarm are specifically designed for multi-agent orchestration. Anthropic’s own Claude Code has shown feature flags for swarm capabilities. The tooling is emerging fast.
But coordination remains the central challenge. Anyone who has managed a software team knows that adding more people does not automatically make things go faster. The same applies to AI agents, perhaps more so, because AI agents lack the social negotiation, shared context, and institutional knowledge that human teams develop over time.
When you deploy multiple agents on a shared codebase, you immediately face questions about who owns which decision, how conflicts get resolved, and how architectural consistency is maintained across parallel work streams. A human team handles this through standup meetings, architectural reviews, shared conventions, and the accumulated trust of working together. AI agents handle it through structured handoff protocols and explicit context passing, which is to say, they handle it through the exact kind of careful specification that AI was supposed to save us from writing.
The irony is worth sitting with. Multi-agent coordination requires precisely the kind of detailed, unambiguous specification that makes single-agent coding productive in the first place. You are not eliminating the specification problem. You are multiplying it.
The Transparency Problem
Here is a question that gets less attention than it deserves: how much of the AI-generated code in your codebase do you actually understand?
One of the most pernicious effects of AI-assisted development is the ease with which generated code can drift away from the developer’s mental model. The AI produces something that compiles, passes the tests you asked it to write, and appears to work. You accept it and move on. Repeat this a few hundred times and you have a codebase where significant portions were never deeply understood by anyone on the team.
This is not a hypothetical risk. Sonar’s 2025 State of Code report found that while developers estimate 42% of their committed code is AI-assisted, the overall proportion of time spent on low-value work has not meaningfully decreased. Instead, the nature of the work shifted. Developers spend less time writing boilerplate and more time validating AI suggestions. Less time drafting documentation and more time correcting generated code. The friction did not disappear. It redistributed.
GitClear’s analysis of over 153 million lines of code found that AI tools have driven code duplication up by four times. Short-term code churn is rising. These are not signs of a codebase getting cleaner. These are signs of a codebase accumulating complexity faster than the team can digest it.
This is the transparency problem in its most personal form. When you stop understanding the code you are shipping, you lose something that no amount of tooling can give back: the ability to reason about your own system when things go wrong.
The Assumption Gap
Large language models are trained to be helpful. This is generally a good quality, except when helpfulness means making assumptions about what you want rather than asking clarifying questions about what you need.
In a typical AI-assisted workflow, you describe a feature and the model generates an implementation. If you left out a critical design decision, perhaps about error handling, data validation, concurrency, or state management, the model will fill in the blanks with whatever pattern seems most common in its training data. It will not stop and say, “This is an important architectural choice. What is your preference?”
Human developers, especially senior ones, do this all the time. They push back on vague requirements. They ask, “What happens when this fails?” They challenge assumptions before writing a single line of code. This adversarial thinking is one of the most valuable things a senior engineer brings to a team, and it is almost entirely absent from current AI tools.
The result is software that works for the happy path and silently accumulates landmines in every direction the developer did not explicitly specify. Each unquestioned assumption is a decision made by a model that has no understanding of your business context, your risk tolerance, or your users.
As I wrote in “The Future Engineer,” the developers who will excel in this era are those who already know how to lead teams by asking the right questions and challenging specifications before work begins. That skill set transfers directly to AI management. The AI does not push back, so you must be even more rigorous in pushing forward.
The Legacy Codebase Reality
The greenfield advantage matters more than most AI advocates acknowledge. Starting a new project with a clear specification and no existing code is the best-case scenario for AI-assisted development. The context window is clean. There are no legacy patterns to navigate. There are no undocumented business rules hiding in a six-year-old service layer.
Most real-world software development is not greenfield. Most of it is maintenance, modification, and extension of existing systems. These systems carry years of accumulated decisions, many of them undocumented, some of them actively contradictory. Understanding why a particular function exists in its current form often requires knowledge that lives nowhere in the codebase and everything in the heads of people who may have left the company years ago.
AI tools struggle with this context. A 2025 report from Qodo found that 65% of developers using AI for refactoring and around 60% for testing, writing, or reviewing say the assistant “misses relevant context.” Among developers who feel AI degrades code quality, 44% blame missing context. Even among those who think AI improves quality, 53% still want better contextual understanding.
This is not a problem that scales smoothly with model size or context window length. The relevant context for modifying a line of legacy code might include a Slack conversation from 2019, a regulatory requirement that was never documented in code, and a production incident that reshaped the team’s approach to a particular problem domain. No model, no matter how large its context window, will reliably surface this information without deliberate human effort to make it available.
The gap between greenfield and legacy is where many of the “AI is basically solved” claims quietly collapse. The demos are always greenfield. The daily work is always legacy.
The Guardrails Question
One of the most underappreciated risks of AI-assisted development is the quiet deletion of safety invariants. An AI agent generates code, runs the tests, and everything passes. But what if the agent also modified the tests? What if it deleted a test that was enforcing an important constraint, not out of malice, but because the test conflicted with the generated implementation and removing it was the path of least resistance?
This is not theoretical paranoia. CodeRabbit’s analysis of 470 open-source GitHub pull requests found that AI-generated code contains 1.7 times more issues on average than human-written code. Critical issues were 1.4 times more frequent. Business logic errors appeared more than twice as often. Error handling gaps nearly doubled. Security findings were about 1.5 times more prevalent.
The problem is not just that AI generates more bugs. The problem is that AI generates bugs in categories that are hardest to catch through automated testing: logic errors, incorrect sequencing, missing dependency handling, and concurrency misuse. These are the kinds of bugs that look correct in a code review, pass the tests, and then cause a production incident at 3 AM on a Saturday.
When you combine this with the volume of code AI can produce, you get a math problem that does not work in anyone’s favor. More code, each piece slightly more likely to contain subtle defects, reviewed by humans who are already feeling the fatigue of increased pull request volume. Google’s 2025 DORA Report found that a 90% increase in AI adoption correlated with a 9% climb in bug rates, a 91% increase in code review time, and a 154% increase in pull request size.
The guardrails question is really a governance question: who is responsible for ensuring that the invariants your system depends on are maintained as AI generates and modifies code at scale? The answer cannot be “the AI will check itself.” It must be a deliberate architectural decision enforced through tooling, testing strategies, and human oversight.
The Code Review Crisis
Here is the problem that nobody has solved, and that I believe will define the next phase of software engineering: code review at the scale AI enables.
Traditional code review works because a human reads the code, understands the intent, and evaluates whether the implementation correctly serves that intent. This process assumes that the volume of code in any given review is small enough for a human to meaningfully engage with it.
AI breaks this assumption. When agents can produce thousands of lines in a session, when pull requests grow by 154%, when median PR size increases by 33%, the traditional line-by-line review model stops working. There simply is not enough human attention available to review code at the rate AI can produce it.
The emerging alternative is to review intent rather than implementation. Instead of reading every line, you verify that the AI’s output matches the specification, that the tests cover the right scenarios, that the architectural decisions align with the system’s design principles. You move from reviewing code to reviewing outcomes.
This is a profound shift and it introduces its own problems. Reviewing intent requires that the intent was clearly specified in the first place, which brings us back to the specification problem that AI was supposed to help us avoid. It also requires a level of systems thinking that not every developer has developed, because until now, they could build that understanding through the very act of writing and reviewing code line by line.
Some teams are turning to AI-assisted code review tools to handle the volume. Platforms like CodeRabbit, Qodo, and Greptile are specifically designed to automate the first pass of review. But even these tools acknowledge that human review remains essential for architecture, business logic, and contextual decisions. The AI can catch syntax and pattern issues. It cannot tell you whether the feature should exist in the first place.
The code review problem is a microcosm of the larger challenge. AI has massively increased the supply of code. It has not proportionally increased the capacity to evaluate, understand, and maintain that code. Until those two curves come into alignment, “almost solved” is a more dangerous state than “clearly unsolved.”
What Actually Gets Solved in a Year
None of this means progress will stall. AI coding tools will improve substantially in the next twelve months. Context windows will grow. Multi-agent coordination will mature. AI-assisted code review will get meaningfully better. The tooling ecosystem will consolidate and professionalize.
What will likely happen in a year is that the daily experience of working with AI on well-defined tasks will become noticeably smoother. The acceptance rate for AI suggestions will climb. The most common integration and testing patterns will become reliable enough that developers stop manually verifying them. For straightforward feature work on modern codebases, AI will handle an increasing share of the implementation with minimal human intervention.
What will not be solved in a year is the set of problems that require judgment, context, and accountability. Specification clarity. Architectural coherence across large systems. Understanding legacy codebases that carry decades of institutional knowledge. Reviewing code at scale without losing the ability to reason about correctness. Maintaining invariants when the tool generating code has no concept of what those invariants mean.
These are not engineering problems in the traditional sense. They are organizational, cognitive, and design problems. They require humans who understand not just how to write code, but why the code exists, what it protects, and what happens when it fails.
The Engineer’s Real Competitive Advantage
If you are reading this and feeling anxious about AI making your skills irrelevant, I want to offer a different framing.
The developers who thrive in this environment will not be the ones who can prompt an AI faster or who have memorized the latest model’s capabilities. They will be the ones who can do the things AI cannot: define clear specifications, challenge assumptions, maintain architectural vision, understand business context, and take accountability for the systems they build.
As I wrote in “The Eternal Promise,” every wave of technology that promised to eliminate programmers instead created demand for more sophisticated developers working on more ambitious projects. COBOL was supposed to let business analysts write their own programs. Visual Basic was supposed to let anyone build applications. Low-code platforms were supposed to make professional developers optional. Each wave succeeded in making certain tasks easier while simultaneously raising the bar for what “building software” actually meant.
AI is following the same pattern, but at a scale and speed that makes the transition more disorienting than previous waves. The implementation layer is being compressed. The layers above it, specification, architecture, integration, governance, are expanding.
If you are investing your time in understanding systems deeply, in learning how to specify what software should do before thinking about how it does it, in developing the judgment to know when AI output is good enough and when it is hiding a subtle defect, you are building exactly the skills that will be most valuable as this technology matures.
The developers at risk are not the ones at slow-adopting companies who have not yet integrated AI into their workflow. They are the ones who have adopted AI enthusiastically but without developing the complementary skills of verification, specification, and architectural thinking. Speed without judgment is not productivity. It is debt accumulation at scale.
The Year Ahead
The claim that AI coding will be “solved” in a year is not wrong because AI is incapable. It is wrong because it misidentifies what “solved” means. If “solved” means that AI can generate syntactically correct code for well-specified tasks on modern codebases, then yes, we are close. If “solved” means that AI can reliably build, maintain, and evolve complex software systems that serve real business needs across diverse organizational contexts, then we are further away than the adoption numbers suggest.
The good news is that this gap represents an enormous opportunity. The tools are improving rapidly. The developers who learn to use them wisely, not just enthusiastically, will be capable of building things that were previously impossible at their scale and budget.
The bad news is that the same gap is a trap for organizations that mistake tool adoption for capability. Deploying AI coding tools without investing in specification discipline, review governance, and architectural standards is like giving a team of junior developers unlimited compute and no senior oversight. You will get a lot of code, very fast, and you will spend the next three years untangling it.
Almost solved is the most dangerous phase because it tempts you to stop doing the hard work of understanding what you are building and why. The engineers and organizations who resist that temptation will define the next era of software development.
The ones who do not will become cautionary tales in someone else’s blog post.
Final Words
Whether you agree with this analysis or find yourself firmly in the “it is basically solved” camp, I would love to hear your perspective. The most interesting conversations happen when people with different experiences compare notes.
You can find me on LinkedIn, X, and Threads where I regularly share thoughts on software development, AI, and the evolving role of engineers.
If you want to discuss something in more depth or explore how these ideas apply to your specific situation, feel free to reach out directly.
What is the hardest unsolved problem in AI-assisted development from your experience? Is it code review at scale, legacy codebase understanding, multi-agent coordination, or something I did not cover? I want to know.
If this post made you think, you'll probably like the next one. I write about what's actually changing in software engineering, not what LinkedIn wants you to believe. No spam, unsubscribe anytime.