Published October 5, 2019
by Doug Klugh
Avoid CI Theatre
Avoid CI Theatre by applying the fundamental practices of Continuous Integration (CI), along with supporting engineering principles, to achieve a mature DevOps Model for continuous software delivery. Precursors of CI Theatre include large user stories, difficult code merges, lengthy code reviews, poor test coverage, long-lived branches, siloed code ownership, broken builds for more than several minutes, and practicing Continuous Isolation by running CI on feature branches. CI is more about culture and mindset than any collection of tools.
Whether your work is ready for primetime or not, it should be integrated into your production code base numerous times each day — at the very least, once daily. In many cases, the feature you’re working on will not be complete and will need to be disabled or “turned off” using Feature Toggles to be able to integrate it with production code without affecting other functionality.
Trunk-Based Development can be an easy practice if your code adheres to the Open/Closed Principle (OCP). This enables you to extend your code without modifying it — so you’re actually changing very little code in the production code base.
Whether you’re following the OCP or not, it is important that the work be decomposed into very small slices of functionality. This Agile principle promotes continuous delivery by deploying and releasing functionality often.
Every time code is committed to the trunk it should kick off an automatic compile, static code analysis, integration, and build. This ensures that none of the changes made on the trunk break the build and remains deployable at all times. If the build fails, it should immediately trigger the Andon Cord and initiate an immediate response by every person required to solve the problem. And if the problem cannot be resolved in a brief, predetermined amount of time, the changeset should be rolled back to keep the build in a healthy and deployable state.
Once the build completes successfully, it should immediately kick off all automated regression test suites — to verify the quality of the build, including the new changes. These tests must serve as quality gates to ensure confidence in the changes we just committed. If any of these gates fail, they should immediately trigger the Andon Cord (as with the build). If the problem cannot be resolved quickly, the changeset should be rolled back to prevent the error from progressing downstream.
The build must always be kept healthy through clean compiles, healthy code analysis, and no integration or build failures. Both the build process and the build itself must be maintained in such a way that is builds quickly and executes fast — certainly within performance requirements.
Any and all failures related to integrating, verifying, and deploying work from the trunk need to be highly visible to every person in the value stream to promote swarming and enable accountability by everyone on the team.
When failures occur, they must trigger an immediate response by every person required to solve the problem. If those people are in the middle of working on something else, they must drop what they’re doing and swarm to resolve the problem.
It is critical that problem be resolved as quickly as possible to prohibit the same failure from arising in future changes and to prevent new failures from being introduced. This also prevents the error from progressing downstream where the cost and effort to resolve the issue would be much greater — not to mention the addition of Technical Debt.
We can avoid major problems by resolving smaller problems earlier in the development lifecycle. And swarming provides numerous learning opportunities and prevents the loss of critical information as time passes and people’s memories continue to fade.
This swarming response is a critical component to Continuous Integration and requires a culture that not only makes it safe but encourages team members to pull the Andon Cord when problems arise — even small ones.