Integrate: Completing the ADD Cycle for AI-Driven Development

Code that passes evaluation is ready for integration. This is the final phase of the ADD cycle, where generated code becomes part of your system. But integration is more than merging a pull request. It is where AI-generated code meets the full reality of your codebase, your testing infrastructure, your deployment pipeline, and your team’s practices.

Integration also closes a feedback loop. What you learn during integration informs your next specification. The cycle continues, each iteration building on the lessons of the previous one.

What Happens After Code Passes Evaluation

Evaluation answers the question: “Is this code correct and appropriate?” Integration answers a different question: “Does this code work within the system?”

These questions are related but not identical. Code can be correct in isolation and still fail in context. It can pass evaluation and still break when it encounters real data, real load, or real edge cases that neither you nor the AI anticipated.

The Integrate phase has five components: testing, documentation, CI/CD validation, system-level review, and technical debt tracking. Each addresses a different aspect of bringing new code into an existing system.

Testing AI-Generated Code

AI-generated code requires the same testing as human-written code, but the testing focus should account for how AI tends to fail.

Test the edge cases explicitly. AI models are trained on examples, and examples tend to show common cases. The happy path is usually well-handled. Edge cases are where AI-generated code most often fails. Empty inputs, maximum sizes, unusual character encodings, concurrent access, resource exhaustion. If your specification listed edge cases, write tests for each. If your specification did not list edge cases, that was a specification gap, and you should note it for future iterations.

Test error handling thoroughly. AI often generates optimistic code that handles errors superficially. Test what happens when dependencies fail. Test what happens when external services time out. Test what happens when data is malformed. Verify that errors are caught, logged, and handled gracefully rather than propagating as unhandled exceptions.

Test integration points. AI generates code in isolation. It does not see how that code interacts with the rest of your system. Write integration tests that exercise the boundaries: API contracts, database interactions, message queue behavior, file system operations. These tests catch mismatches between what the AI assumed and what your system actually provides.

Test with realistic data. AI-generated code is often tested with simple, clean data during development. Production data is messier. Run your tests with data that reflects production reality: varied formats, legacy records, unexpected nulls, strings in multiple languages. Data that worked in the AI’s training examples may not match your actual data.

Consider property-based testing. For algorithmic code, property-based testing can reveal failures that example-based tests miss. Instead of testing specific inputs and outputs, you define properties that should always hold, and the testing framework generates many random inputs to check those properties. This is particularly valuable for AI-generated code because it explores the input space more thoroughly than hand-written examples.

Documentation Updates

AI-generated code often comes with AI-generated documentation: comments, docstrings, and sometimes README updates. During the Generate phase, you may have iterated on the code multiple times. The documentation may describe an earlier version rather than the final implementation.

Verify comment accuracy. Read every comment in the generated code and verify it describes what the code actually does. Pay particular attention to comments describing edge case handling, error conditions, and performance characteristics. These are the comments most likely to be outdated after iterations.

Update architectural documentation. If the generated code introduces new components, patterns, or dependencies, update your architectural documentation to reflect this. The AI does not know about your architecture diagrams, your decision records, or your system documentation. You must update these manually.

Document the ADD context. Consider adding a brief note about how this code was generated. Not in the code itself, but in your commit message or PR description. “Generated using ADD with specification X, evaluated against checklist Y.” This context helps future developers understand the code’s origin and provides a trail back to the specification if questions arise.

Generate or update API documentation. If the code exposes APIs, ensure the API documentation is accurate and complete. AI-generated code may follow your API conventions, but the documentation generators that read those conventions need the actual code in place before they can generate accurate docs.

CI/CD Validation

Your continuous integration pipeline provides automated validation that complements manual evaluation. Let the machines check what machines check well.

Run the full test suite. Not just the tests for the new code, but the entire suite. AI-generated code may have unexpected interactions with existing functionality. A full test run catches regressions that focused testing might miss.

Enforce code quality gates. Linting, formatting, static analysis, type checking. These automated tools catch issues that humans overlook. If your pipeline includes security scanning, code coverage thresholds, or complexity metrics, let them run against the AI-generated code just as they would against human-written code.

Validate deployment requirements. If your code requires new environment variables, new dependencies, new infrastructure, or new permissions, the deployment pipeline should validate that these are in place. AI-generated code sometimes assumes dependencies that do not exist in your environment.

Review pipeline failures carefully. When CI fails on AI-generated code, investigate thoroughly. The failure might reveal an issue with the generated code, but it might also reveal a gap in your specification. If the AI produced code that violates a convention enforced by your pipeline, you should have specified that convention. Add it to your context library for future generations.

System-Level Review

Code that passes all tests and all automated checks can still be wrong at the system level. The Integrate phase includes a step back to consider the broader impact.

Consider performance at scale. The code works with test data. Will it work with production data volumes? If you are adding a new database query, how will it perform when the table has millions of rows? If you are adding a new API endpoint, how will it perform under load? System-level thinking catches performance issues that component-level testing misses.

Consider failure modes. How does the system behave when this new code fails? Does failure cascade? Does it fail gracefully with clear error messages? Does it trigger appropriate alerts? System resilience depends on understanding how components fail, not just how they succeed.

Consider operational impact. Does this code change how the system is monitored, deployed, or maintained? Does it introduce new log messages that operators need to understand? Does it change resource consumption patterns? Operations teams need to know when the system changes in ways that affect them.

Consider security surface. Does this code expose new attack surfaces? Does it handle user input in new ways? Does it access sensitive data? Security review at the system level considers how the new code fits into the overall security posture, not just whether the code itself is secure.

Consider dependencies. Does this code introduce new dependencies, either explicit (new libraries) or implicit (new services it calls)? Each dependency is a potential point of failure and a potential security risk. System-level review considers whether the dependencies are justified and appropriately managed.

Technical Debt Tracking

Sometimes you integrate code knowing it is not perfect. Perhaps the specification was incomplete and you discovered the gaps during integration. Perhaps the code works but is not optimally structured. Perhaps time pressure requires shipping now and improving later.

This is acceptable, but only if you track it explicitly.

Document known limitations. If the code has known edge cases it does not handle, document them. If the code has performance characteristics that may need improvement, document them. Create tickets, add TODO comments with ticket references, or update your technical debt tracking system.

Distinguish intentional from accidental. Intentional technical debt is a decision: “We know this is not perfect, but we are shipping it because the tradeoff is acceptable.” Accidental technical debt is a surprise: “We did not realize this was a problem.” Tracking helps you understand which debt you are taking on deliberately and which you are accumulating without awareness.

Set review triggers. For significant technical debt, define when you will revisit it. “Review this when we exceed 10,000 users.” “Revisit this before the next major release.” Without explicit triggers, technical debt tends to be forgotten until it causes a crisis.

Learn from debt patterns. If you consistently integrate code with the same category of technical debt, you have a systemic issue. Perhaps your specifications need to be more comprehensive. Perhaps your evaluation checklist needs new items. Perhaps certain types of generation are not working well for your team. Track the patterns, not just the individual items.

The Integration Phase as Learning Opportunity

Every integration teaches you something. The question is whether you capture the lesson.

What did testing reveal? If your tests caught issues, those issues represent gaps in evaluation. Should your evaluation checklist include items that would have caught these issues earlier? Should your specification templates prompt for information that was missing?

What did CI/CD catch? If your pipeline caught issues, those issues represent gaps in your generation context. Should your system prompt include conventions that the AI violated? Should your constraint patterns be more explicit?

What did system-level review uncover? If you discovered issues at the system level, those issues represent gaps in your specification process. Should specifications include more context about system integration? Should you add integration considerations to your specification templates?

Feed these lessons back into the cycle. Update your specification templates. Update your evaluation checklists. Update your context libraries. Each integration makes the next specification better.

When Integration Reveals Evaluation Gaps

Sometimes integration fails in ways that evaluation should have caught. The code breaks in production, or causes problems during deployment, or fails in ways that the evaluation process missed.

This is painful but valuable. It reveals exactly where your evaluation process needs improvement.

Conduct a brief retrospective. What went wrong? What checklist item would have caught it? What specification detail would have prevented it? What context would have helped the AI avoid it?

Resist the temptation to blame the AI. The AI generated what your specification and context guided it to generate. The failure is a system failure: specification, generation, evaluation, and integration working together imperfectly. The fix is to improve the system, not to distrust the tool.

Completing the Cycle and Starting the Next

With integration complete, you have moved from specification to working code in your system. The ADD cycle is done.

But it is rarely a single cycle. Complex features require multiple cycles. Each cycle builds on the previous: the code from cycle one becomes context for cycle two. The lessons from cycle one improve the specification for cycle two.

Over time, you develop intuition for how to scope cycles. Too large, and generation becomes unreliable. Too small, and overhead dominates. The right size produces code that can be specified clearly, generated reliably, evaluated thoroughly, and integrated smoothly. This judgment comes from practice.

The cumulative result is systems built through structured collaboration. Not systems where AI wrote the code and humans watched, but systems where humans specified, guided, evaluated, and integrated while AI generated. The human remains the engineer. The AI is a powerful tool wielded with discipline.

Let’s Continue the Conversation

What has integration taught you about your specifications or evaluations? Where have you discovered gaps that earlier phases should have caught?

How do you track technical debt in AI-generated code? Do you treat it differently from debt in human-written code?

Share your experience in the comments. Integration is where theory meets reality, and real-world lessons improve everyone’s practice.