"Done" Is Not Merge: Why Your Definition of Done Is a Lie

The most dangerous lie in software engineering is a two-word status update: “It’s done.”

In most teams, done means the pull request was merged. The CI pipeline turned green. The ticket moved to the right column on the board. Someone wrote “Done” in Slack, maybe with a checkmark emoji for extra confidence. And then everyone moved on to the next thing.

But merging code into a main branch is not shipping software. It is the beginning of a process that too many teams treat as the end. And this confusion, this quiet redefinition of what “done” means, is responsible for more production incidents, more midnight pages, more customer trust erosion, and more burned-out engineers than almost any other cultural failure in our industry.

Done is not merge. Done is tested, deployed, and stable in production.

That distinction sounds simple. It is not. Getting it right requires rethinking how teams work, what they measure, what they reward, and what they are willing to slow down for. Let me explain why.

The Merge Illusion

There is a specific moment in the software development lifecycle that feels like completion but is not. It is the moment a pull request gets approved, the last comment is resolved, the branch merges cleanly, and the CI checks pass. There is even a satisfying visual cue: the merge button turns green, the branch disappears, and the commit lands in main. Psychologically, this feels like closure. The problem is solved. The feature is built. The work is finished.

Except nothing has actually happened yet. Not for the user. Not for the business. Not for the infrastructure that will bear the weight of this change under real traffic, real data, and real edge cases that no test environment can perfectly simulate.

What happened is that a set of changes entered a shared codebase. That is it. Those changes have not been deployed. They have not been observed under production conditions. They have not survived their first hour of real-world usage. And until they do, calling them “done” is, at best, premature optimism and, at worst, organizational self-deception.

I have seen this pattern play out across two decades of building systems in fintech, payments, and high-traffic platforms. A team merges a feature on Friday afternoon. Everyone marks their tickets as complete. The sprint velocity looks great. Then Monday morning arrives with a flood of error alerts because the change interacted with a production data pattern that did not exist in staging. The feature was “done” for three days before anyone realized it was broken.

The merge illusion is not a tooling problem. It is a cultural one. Teams treat the merge as the finish line because that is what their processes implicitly reward. Sprint reviews celebrate merged stories. Velocity metrics count points when tickets move to “Done.” Performance reviews highlight features shipped, where “shipped” quietly means “merged.” The entire incentive structure points at the wrong moment in the pipeline.

Why This Matters More Than You Think

If you are building a side project or a prototype, the distinction between merged and deployed barely matters. But the moment real users depend on your software, the moment money moves through your system, the moment uptime is not optional, the gap between “merged” and “production-stable” becomes the most important distance in your entire engineering organization.

The 2024 Accelerate State of DevOps Report from Google’s DORA team tells a revealing story about this gap. Their research, drawn from thousands of technology professionals worldwide, found that elite-performing teams deploy 182 times more frequently than low performers while maintaining eight times lower change failure rates. That is not a typo. The teams that ship the most also break production the least. And the core reason is that elite teams do not consider work done at merge. They consider work done when it is verified in production.

The DORA research introduced a fifth metric in 2024: deployment rework rate, measuring the percentage of deployments that are unplanned work to fix bugs. This metric exists precisely because the industry needed a way to quantify the cost of calling things “done” too early. When teams merge without verifying production stability, they create rework. Rework consumes capacity that should go toward new value. Rework demoralizes engineers who thought they were finished. Rework erodes customer trust one incident at a time.

Consider the CrowdStrike incident in July 2024. A routine update, presumably tested and approved through internal review processes, was deployed and caused failures across 8.5 million Windows devices globally. The estimated financial damage exceeded three billion dollars. Banking, healthcare, and transportation sectors were paralyzed for up to 72 hours. The update had been “done” by every internal definition before it reached production. Production disagreed.

That is an extreme example, but the pattern scales down to every team that has ever merged a database migration that works perfectly in staging and locks a production table for forty minutes. Or a caching change that behaves correctly under test load but causes a thundering herd under real traffic patterns. Or an API change that passes all contract tests but breaks a mobile client that was not included in the test matrix.

The gap between merge and production stability is where most real damage happens. And it happens quietly, repeatedly, across thousands of teams every single day.

The Anatomy of Actually Done

If “done” is not merge, then what is it? Let me be specific. A piece of work is done when all of the following are true, and not before.

The code has been reviewed and merged. This is step one, not the final step. Code review catches logical errors, design problems, and style inconsistencies. It is necessary. It is nowhere near sufficient.

Automated tests pass in the CI pipeline. Unit tests, integration tests, and whatever automated checks your pipeline includes should all pass. But teams that stop here are confusing “the code compiles and tests pass” with “the software works.” These are different claims about different things.

The change has been deployed to a production-equivalent environment. Staging environments that do not mirror production data volumes, traffic patterns, third-party service behaviors, and infrastructure configurations are not production-equivalent. They are approximations. Useful approximations, but approximations nonetheless.

The deployment to production has been completed. The code is running on production infrastructure, serving real traffic. Not waiting in a release queue. Not behind a feature flag that nobody has toggled. Not in a canary that has not been promoted. Actually deployed. Actually running.

Post-deployment verification has been performed. Someone has confirmed that the change behaves as expected in production. This might mean checking dashboards, reviewing error rates, verifying key user flows, or confirming that monitoring alerts have not fired. The method matters less than the fact that verification happened.

The change is stable in production over a meaningful observation window. This is the part almost everyone skips. A change that works for the first five minutes might fail under overnight batch processing, peak traffic hours, or end-of-month reconciliation jobs. Stability is not a point-in-time check. It is a window of observed behavior. That window might be an hour for a cosmetic change or a full business cycle for a payment processing modification. The duration should be proportional to the risk.

Only after all six conditions are met should a ticket be marked as done. Only then should it count toward velocity. Only then should an engineer mentally move on to the next task. Everything before that final confirmation is work in progress, regardless of what column it sits in on your project board.

The Cultural Debt Behind the Wrong Definition

If the correct definition of done is so straightforward, why do most teams get it wrong? Because getting it right is culturally expensive.

Expanding the definition of done from “merged” to “stable in production” has consequences that ripple through the entire organization. Sprint velocity goes down, at least on paper, because work takes longer to complete by the new definition. Stakeholders who are used to seeing 30 story points per sprint now see 20. Product managers who promised a feature by a certain date realize the feature is not done when the last PR merges but when production confirms it works. Release planning gets harder because you cannot predict exactly when production stability will be confirmed.

These are real costs. And in organizations that optimize for the appearance of speed over actual delivery, the honest definition of done loses every time. It is much easier to point at a merged PR and say “we shipped it” than to admit that shipping requires another day of observation and verification.

There is also an ego component. Engineers naturally want to feel the satisfaction of completion. The moment a PR merges, there is a dopamine hit of closure. Asking someone to hold off on that satisfaction, to keep the ticket open, to stay mentally engaged with a change through deployment and observation, requires a discipline that runs counter to how our brains process accomplishment. We are wired to close loops, and a merged PR feels like a closed loop even when it is not.

Then there is the tooling problem. Most project management tools make it trivially easy to move a ticket to “Done” but surprisingly hard to create a workflow that enforces post-deployment verification. Jira, Linear, and similar tools are designed around the concept of developer handoff, not production confirmation. Adding a “Verify in Production” column is easy. Enforcing that it is not skipped requires cultural commitment that no tool can automate.

The result is what I call definition-of-done debt. Teams carry an ever-growing gap between what they claim is done and what is actually stable in production. This debt compounds exactly like technical debt. Small at first, invisible to management, and eventually catastrophic when the accumulated lies collide with reality during a critical outage.

What the DORA Data Actually Tells Us

The 2024 DORA research offers some counterintuitive findings that directly relate to the definition of done problem. For the second consecutive year, the data shows that AI tooling correlates with worsened software delivery performance at the team level. Teams using AI code generation tools are writing more code faster, but that speed is not translating into better delivery outcomes. In many cases, the opposite is happening.

This finding makes perfect sense through the lens of the definition of done. AI tools accelerate the journey to merge. They help write code faster, generate tests faster, resolve review comments faster. But they do nothing to accelerate the journey from merge to production stability. If anything, they make that journey more treacherous because more code is being merged more quickly with less time for human consideration of production implications.

The DORA data also found that the high-performance cluster shrank from 31% of respondents in 2023 to just 22% in 2024, while the low-performance cluster grew from 17% to 25%. Software delivery performance is getting worse for most teams, not better. This is happening during a period of unprecedented tooling improvement, faster CI/CD pipelines, and widespread adoption of deployment automation.

The obvious question is why. And the answer, I believe, is that the industry is optimizing the wrong phase of the delivery pipeline. We have invested enormously in making the path from idea to merge faster and smoother. We have invested comparatively little in making the path from merge to production stability reliable and disciplined. The tools are better than ever. The definition of done has never been worse.

Consider the change failure rate metric specifically. DORA found that for elite teams, the rate sits between 0% and 15%. For low-performing teams, it can exceed 64%. The difference is not that elite teams write better code. It is that elite teams have processes that catch problems before and immediately after production deployment. They do not declare victory at merge.

The Staging Trap

I need to address a common objection here. Many teams will say: “We do not consider work done at merge. We consider it done when it passes staging.” And while that is better, it is still not enough.

Staging environments are necessary. They are not sufficient. Every experienced engineer has a collection of stories about things that worked perfectly in staging and failed immediately in production. The reasons are well understood but persistently underestimated.

Staging databases are smaller. They contain test data with predictable distributions, not the chaotic accumulation of years of real user behavior. That query that runs in 50 milliseconds against a staging database with 100,000 rows might run for 45 seconds against a production database with 80 million rows and fragmented indexes.

Staging traffic patterns are artificial. Real traffic has spikes, seasonal patterns, geographic distributions, and concurrency profiles that synthetic load testing only approximates. A connection pool that handles staging load comfortably might exhaust under production concurrency. A rate limiter that seems well-tuned against simulated requests might throttle legitimate users during a flash sale.

Third-party services behave differently in staging. Sandbox APIs from payment providers, email services, and identity platforms do not replicate the latency, error rates, and timeout behaviors of their production counterparts. An integration that appears bulletproof in staging might fail under the specific network conditions and rate limits of production third-party endpoints.

Staging is a rehearsal. Production is the performance. And anyone who has ever performed anything knows that rehearsals and performances are fundamentally different experiences. The audience changes everything.

How to Actually Fix This

Fixing the definition of done is not a tooling change or a process tweak. It is a cultural shift that requires commitment at every level of the engineering organization. Here is what it looks like in practice.

Redefine “Done” explicitly and make it visible. Write down what done means for your team. Print it on the wall if you have to. Make it part of your onboarding documentation. The definition should include production deployment and a stability observation window. Every new team member should understand on day one that merged is not done.

Change what you measure. Stop measuring velocity in terms of merged stories. Start measuring it in terms of changes verified stable in production. This single change in measurement will transform how your team thinks about completeness. When the scorecard rewards production stability, people will optimize for production stability.

Make deployment a developer responsibility, not an ops handoff. The person who writes the code should be involved in deploying it and verifying it works. This is not about eliminating platform teams or DevOps roles. It is about ensuring that the engineer who understands the change most deeply is the one confirming its production behavior. When deployment is someone else’s problem, production verification becomes no one’s responsibility.

Build post-deployment verification into your pipeline. Automated smoke tests that run after deployment. Dashboard checks that compare pre-deployment and post-deployment metrics. Alerting rules that are sensitive to the specific changes being deployed. These do not replace human judgment, but they create a minimum baseline of production verification that happens every single time, even on Friday afternoons when everyone wants to go home.

Institute a stability observation window. Define how long a change needs to run in production before it is considered done. For low-risk cosmetic changes, this might be an hour. For payment processing changes, it might be a full business day or a complete transaction cycle. The window should be proportional to the blast radius of a potential failure. Document these windows and enforce them.

Add a “Verify in Production” step to your workflow. This should be a mandatory column on your project board that sits between “Deployed” and “Done.” Tickets cannot move to “Done” until someone has actively verified production behavior and confirmed stability within the defined observation window. Automate the verification where possible, but do not let automation substitute for intentional human confirmation on high-risk changes.

Make rollback a first-class citizen. If your deployment process does not support fast, reliable rollback, your definition of done is built on a foundation of hope. Every deployment should have a clear rollback plan. Every engineer should know how to execute it. And the time to test rollback procedures is before you need them at 2 AM, not during the incident.

Protect the time required for verification. This is the hardest part. Product managers will push for the next feature. Sprint planning will try to pack every available hour. The pressure to move on to the next ticket immediately after merge is constant and relentless. Leadership must create space for post-deployment verification and treat it as an essential part of delivery, not as overhead that can be skipped when things get busy.

The AI Acceleration Problem

This issue is becoming more urgent, not less, because AI-assisted development is dramatically increasing the speed at which code reaches the merge point. When developers can generate implementations faster, write tests faster, and resolve code review feedback faster, the merge rate accelerates. But the laws of production do not accelerate with it.

Production databases do not process queries faster because AI wrote the code. Network latency does not shrink because Copilot generated the API call. Third-party rate limits do not increase because the integration was built in half the time. The time between merge and production stability is governed by physical and operational constraints that AI cannot compress.

This creates a dangerous bottleneck. Code enters the pipeline faster than it can be verified in production. The backlog shifts from “waiting to be built” to “waiting to be confirmed stable.” And if teams respond to this bottleneck by relaxing their verification standards, because the code was AI-generated and the tests all pass, they trade visible speed for invisible risk.

The 2024 DORA finding that AI tools correlate with worse delivery performance is not a condemnation of AI. It is a warning that accelerating one phase of the pipeline without proportionally investing in the phases that follow creates imbalance. And imbalanced pipelines produce failures.

If you are using AI to write code faster, you should be investing equally in deploying and verifying faster. Otherwise, you are just filling a funnel at the top while the bottom stays the same size. The overflow is production incidents.

The Economics of the Real Definition

I understand the reluctance to adopt a stricter definition of done. It looks slower on paper. It produces lower velocity numbers. It makes sprint planning less predictable. These are legitimate business concerns.

But consider the economics from the other direction. Every production incident caused by a change that was “done” but not verified consumes engineering time for investigation and remediation. It often requires urgent communication with affected customers. It may trigger SLA penalties. It erodes user trust in ways that are difficult to quantify but very real. It creates context-switching costs as engineers are pulled from current work to fix past work they thought was finished. And it generates rework, the newly measurable metric that DORA added precisely because it is so expensive.

In my experience building payment systems where a production bug can mean real money moving to the wrong place, the cost of a single unverified deployment easily exceeds the cumulative cost of a hundred post-deployment verification cycles. The math is not close. Spending thirty minutes verifying a deployment is not overhead. It is insurance with an extraordinary return on investment.

The teams I have seen operate most effectively are the ones that understood this math early. They appear slower in sprint velocity metrics. They are dramatically faster in time-to-value metrics because their features actually work when they reach users, and they spend almost no time on incident response and rework. Their “slow” is faster than most teams’ “fast” because their fast includes the hidden cost of cleaning up after premature declarations of done.

A Different Relationship With Completion

At its core, this is about developing a different relationship with the idea of completion. In a world that celebrates shipping speed above all else, it takes discipline to say: “This is not done yet. It is merged. It is deployed. But I have not confirmed it is stable.”

That discipline is what separates professional software engineering from hobbyist coding. In a hobby project, merged really is done because there are no users depending on stability, no SLAs to honor, no revenue at risk. But the moment software enters production at scale, the definition of done must expand to include the reality that production is a different environment with different constraints, different failure modes, and different consequences.

The best engineers I have worked with over two decades share this trait. They do not relax after the merge. They do not mentally close the loop when the CI turns green. They carry a low-level awareness that the work is not finished until production confirms it. They check dashboards after deployment without being asked. They set personal reminders to verify stability the next morning. They treat the observation window as part of the job, not as extra credit.

This is not paranoia. It is professionalism. A surgeon does not declare a procedure complete the moment the incision is closed. There is a recovery period, a monitoring phase, a follow-up. Software in production deserves the same rigor. The code is not the product. The running, stable, verified system is the product. Everything else is work in progress.

The Definition That Earns Trust

When a team adopts the real definition of done, something remarkable happens. Product managers start trusting engineering timelines because “done” stops being followed by production incidents. Customers stop encountering bugs that should have been caught before they affected anyone. Oncall rotations become calmer because fewer changes reach production without verification. Engineers spend less time fighting fires and more time building things that matter.

Trust is the real output of a correct definition of done. Not velocity. Not story points. Not the number of PRs merged per sprint. Trust. Trust from your users that the software works. Trust from your business that engineering delivers what it promises. Trust from your team that when someone says “it is done,” they mean it.

That trust is worth more than any velocity metric. And it starts with the decision to stop lying to yourself about what “done” means.

Done is not merge. Done is not a green CI pipeline. Done is not a ticket in the right column. Done is tested, deployed, and stable in production. That is the only definition that matters. Everything else is a comfortable fiction that production will eventually expose.

Final Thoughts

IIf this resonated with you, or even if you disagree, I would love to hear your take. The best conversations happen when people with real experience push back, share their own observations, and add nuance to the discussion.

You can follow me on LinkedIn for regular thoughts on software engineering, AI, and the realities of building technology over the long term. I also post on X and Threads for shorter, sharper takes.

If you want to discuss this topic further, collaborate on something, or just say hello, feel free to reach out. I respond to everything.

What do you think? Does your team consider work done at merge, at deployment, or only after confirmed production stability? What challenges have you faced in moving the definition further right? Drop me a message. I am genuinely curious what you are seeing from where you sit.

If this post made you think, you'll probably like the next one. I write about what's actually changing in software engineering, not what LinkedIn wants you to believe. No spam, unsubscribe anytime.

“Done” Is Not Merge: Why Your Definition of Done Is Probably a Lie