“If you let your head get too big, it’ll break your neck.”
― Elvis Presley
This series explores the common, but painful, issue of companies that have lost control over their Engineering team. In the previous post, I introduced an example project that made it to the end only to discover that the system was completely unusable. In this post, we’ll explore another shark jumping project.
Death by scale [1]
Our next example presents finished software that is unusable in a different way – it collapses under the weight of real-world usage. I’m guessing that many of you have lived this one first hand.
This particular project was going to revolutionize the lives of mobile phone users. In the pre-smartphone days, mobile carriers were trying to figure out how to provide customized experiences to their subscribers, such as sports scores and stock market data. To keep track of what was interesting to each subscriber, the new system would need store a myriad of profile data (favorite sports teams, stocks the user was interested in, their horoscope sign, etc.). From this data, the system could serve up personalized text messages (in an era when – believe it or not – they charged you for each one).
The system was well-specified and the Engineering team spent a lot of time designing the architecture of the system so that they could modify it easily in the future. Everyone agreed it was one of the best systems they had worked on – the code was absolutely beautiful. They tested it with a lot of Mickey Mouse data (literally – most of the test accounts had names from various cartoon characters). Although the Engineers knew that the system would be used by large mobile phone carriers, they didn’t have time to create a large set of test data to see what would happen when there were thousands or millions of records.
In preparation for the system going live, the team migrated data from the carrier’s previous system to seed the database with a list of all subscribers – 20 million in all. Somewhere in the process, the database hit a physical limit and crashed. The database vendor – a huge long-standing technology company – reluctantly admitted that their database couldn’t hold a table that large (there were about 1,000 rows per subscriber for a total of 20 billion rows of data). The team tried again with smaller and smaller sets of data, finally deciding to only convert a small portion of the subscribers. The database didn’t implode this time, but the response time for requests to the system was so huge that a single request into the system took 30 seconds. Even with a small fraction of the actual data, the system couldn’t cut it.
The team had to go back to the drawing board. It took several months to come up with a performant solution. In the meantime, the customer lost faith in the project and it never saw the light of day.
What went wrong?
In this example, the team got to the end with a system that was not ready to handle anything like the real-world scale that got thrown at it.
If we understand our customer, we understand their real-world needs. We know what real data looks like AND what volumes and response times we need to achieve.
The Customer Membrane includes, as part of Customer Needs and Requirements, non-functional requirements for performance and expected system volumes.
Like security, performance and scale are often assumed or implied but are amongst the most important requirements for the customer. There are many performant but highly imperfect systems that users grow to love [2]. There are no systems that are not performant, no matter how awesome otherwise, that endear themselves to users.
In the next post, I’ll share a final example and analyze it using the model. I look forward to viewing your comments.
This post is based on or excerpted from the upcoming book “De-Engineering the Corporation” by Darryl Ricker
Footnotes:
[1] For the record, I’m not talking about my latest dieting efforts.
[2] or at least tolerate with some light affection.
Leave a Reply