Delivering working software against all odds by using tools, metrics, and guerilla XP tactics
The project that Exigen Services implemented for a large multinational telecom company in Europe has been one of the most challenging, unconventional, and rewarding XP exercises for us so far. All development and testing was done 100% offshore in St. Petersburg, Russia, for the remote customer. We built an Intranet resource and project management application. Now, 30 months later, the system has 1500 users across four countries. The offshore XP team has grown from three to 17 and has delivered 18 versions of the system into production. One of the major achievements was bringing the defect rate down while keeping good velocity and implementing changes rapidly.
It was obvious right from the start that things would not be easy. Our remote customer was a very busy senior executive several time zones away who had no time for daily conference calls and preferred communication via e-mail. The requirements, which were informal most of the time and looked nothing like proper user stories, were also sent by e-mail. There were many uncertainties, but three things were clear:
The customer actually had no intention of doing acceptance testing. We (the consultants) were even asked to deploy the product to the production environment ourselves! We were also expected to deliver changes the same week that they were requested.
We suggested using XP and "filling in the blanks" for the remote customer by having the project manager writing user stories. The customer agreed. However, the lack of any kind of safety net was disturbing, especially because some of the system's users, a large team of Java developers, were right next door in the same building. They had absolutely no sympathy for us, the .NET guys, and were skeptical about our ability to build a solid product.
That was how we started our two-year quest for quality.
I must admit, some of the challenges in this quest were unexpected. After all, XP has a great set of practices for those who want to improve quality. Most of us had been using XP for a while, so we didn't have any issues with pair programming, refactoring, or collective code ownership.
However, during the first couple of months, as the team was in a hurry to get the first release into production, we started cutting corners: we were sloppy on continuous integration and configuration management, were not doing enough testing, and we used quick solutions (such as copy & paste) instead of simple ones. By the start of Iteration 4, we did have a working application with some users, but the code was more duplicated than reused, the unit test suite was mediocre, and we didn't like our high-level design. It all started to show: we had to fix up to five customer issues every week, all of which were usually our fault. The annoying thing about the defects was that most of them were silly and avoidable.
Some of us believed that we simply needed more time for development and testing. That was true, but the customer was already not happy with our velocity, so we had little wiggle room there. Instead, we tried to identify and eliminate things that were slowing us down.
One area concerned the product configuration, integration, and deployment. So we started by using NAnt to automate all regular developer tasks. We deployed a continuous integration server (CruiseControl.Net). It ran a scheduled build every 15 minutes initially, switching to once every hour later on. That allowed us to get rid of configuration errors and improve the velocity.
The next thing we needed was a coding standard, but nobody wanted to write one. Moreover, we thought that nobody would want to read one either, even if we had it. We worked around this problem with the help of Microsoft FxCop. The idea was to use it as an executable coding standard, and it worked very well.
However, FxCop did not cover all our needs. For example, to get the object design right, we wanted to be sure that all our methods were less than 80 LOC long and did not contain excessively complex conditional logic. We used DevMetrics to track the Maximum Cyclomatic Complexity and maximum lines of code per method (see Figure 1). Then, there was the issue of code duplication. We used Simian to find duplicated code, and even measured the "signal-to-noise" ratio of our code.
Figure 1: Sample Quality Metrics
From the start, we had two automated test suites. Developers wrote unit tests using the Test First approach, and our test team was tasked with creating the automated acceptance test suite with TestComplete. Over time, we achieved coverage of more than 80% with each suite by following a simple rule: no code change should decrease coverage.
The major issue we had with both unit and acceptance testing was execution time. Initially, we ran the entire unit test suite before each source code submit. Today, however, we have some 3,500 unit tests and it takes almost an hour to run them, so very few developers execute the entire suite. Instead, the tests are run during each build, which happens often enough to alert us to potential problems. We categorized tests to allow for separate execution of, for example, Business Layer tests and Data Access Layer tests.
It seemed that poor unit test performance was an indicator of problems with the application itself, so we spent some time working on unit test optimization. Apart from the slow application code, we had several issues with our test design. For example, we were able to significantly decrease execution time using mock objects instead of the real database.
Sometimes choosing the simplest thing that works is very hard. I would bet that any Agile project participant can recall a heated debate over the relative simplicity of several proposed solutions. To tackle the issue, we agreed that in our particular project a "simple" solution would mean "one requiring less lines of code." The approach had its limitations but it worked, and that was one of the reasons we needed to measure the amount of code duplication in our project.
Reuse was very easy with C#, easy with XSLT, and hard with SQL. What's more, the three-tiered architecture forced us to duplicate some logic in different tiers and in different languages. For example, we had to check that the start date of a period was not greater than its end date in at least three places: on a Web page, a C# business object, and in the database. The solution we successfully implemented included an XML-based Domain-Specific Model (DSM) of our system which was used to generate the code in C#, SQL, and other languages. That allowed us to decrease the amount of code we had to write and support by over 30%. We were also able to dramatically reduce inconsistencies between application tiers. That obviously had a very positive effect on both quality and velocity.
One year into the project, we had decreased the defect rate by 50% and were maintaining a decent speed. One of the largest challenges at that point was that the user base had increased fivefold. And, of course, quality remained critical.
We used conventional XP release planning and iteration planning. The issue was that the customer wanted to know the exact release date in advance, but preferred changing the release date to cutting the scope. We also had to accommodate a lot of change requests, sometimes as much as one third of the estimated scope for the iteration. Errors in determining the exact schedule usually led to tension with the customer, and sometimes prevented us from working at sustainable pace.
In retrospect, our solution looks obvious, although we took a long time to figure it out. It was our burn-up chart that actually helped. We could never use big visible charts for communicating our progress to the customer due to the remote nature of the project, so we used a simple Excel spreadsheet (see Figure 2).
Figure 2: Sample Status Report
The chart was OK, except it didn't show the amount of changes in the way that would allow us to see how much they added to the iteration length. After some thinking, we came up with a different graph (see Figure 3), which had two major advantages:
Then, we were able to calculate the average number of added perfect hours (to implement change requests) per interation day and account for that in our schedule calculations, which mostly solved the problem.
Figure 3: Sample Burn-up Chart
When the offshore team had grown to 17 people, discussing all project issues in daily stand-up meetings with all 17 of us present just didn't make sense anymore. We started having short Scrum-style stand-ups, where everyone answered three questions: What did you do? What are you going to do? and What is getting in your way? That worked, but we still needed to discuss the stories in detail.
That was why we introduced the so-called introduction and transition meetings, which we conducted before and after implementing every story. The participants were usually the developers and testers directly involved, the team leads, and the system analyst (all offshore).
The introduction meeting helped us get a common understanding of the required functionality. During the transition meeting (similar to the Scrum Review Meeting) we tried to determine whether the story in question was ready for testing. The meeting played an extremely important role in driving the number of defects down, as the developers received feedback much faster than before. With time, it became clear that some of the defects we found during the transition meetings appeared again and again, so we made the list of typical mistakes and asked the developers to review it before calling for a transition meeting.
We noticed that besides implementing stories and change requests, developers constantly made some changes to seemingly unrelated parts of the system. That generally increased the overall quality of the system; the problem was that some changes slipped into production without the proper testing.
After getting several calls from customer, we decided to test every single submittal to our version control system separately from the usual testing. That had a very positive effect to the team: developers finally got their safety net, and the testers felt more confident about the final result.
On this project, we've successfully reached our goals. Our customer defect rate is less than one defect per week. We can deploy a change on the same week that we receive a request, and we have decreased this period to one or two days for smaller changes. Our unusually big offshore XP team is consistently productive, even without ongoing customer presence. Because the customer is remote and effectively agnostic as to whether the team uses XP or not, we call this "Guerilla XP."
Sergey Belov works as Division Manager at Exigen Services. He is responsible for multiple development projects for Exigen’s US and European clients. Sergey has been with the company since 2004 and is one of the most passionate proponents of Agile in the organization. A Certified Scrum Master, Sergey is a frequent presenter on the topics of Agile development both at internal company trainings and at international professional conferences.