I’ve broken this post up into two parts, the first directed at convincing you to buy this book and read it several times, and the second to open up discussion for those who have read the book. There will be spoilers in the second part.
PART 1: No Spoilers
I borrowed this book from a co-worker on Friday and finished it Saturday. Yup, done in one day. 382 pages of stories that seem like they could have come straight from my work related nightmares.
The main character Bill takes over after his boss and his boss’s boss both leave the company. The company is not an IT company and the growing complexity of IT has caused great stress and financial loss.
It is an obvious plug for DevOps. By the end of reading you might wonder if there is any other way to get things done. Keep a skeptical view and enjoy this book.
PART 2: SPOILERS
After the first 10 chapters, I didn’t know how much more I could take. I was physically stressed after reading about the constant firefighting, poor communication, late nights, political sabotage, yelling, swearing, night/weekends/all-nighters, and unreasonable demands. The book depicted a sad state of affairs. I recognized some of the outages, and even the blame game comments sounded spot on.
Its like they consolidated the most frustrating parts of my 9 years at my current company into 3 months. I’m a SAN administrator and that first outage of payroll that got blamed on the SAN but ended up being a poorly implemented security feature caused my first wave of stress. It was like watching a horror movie. “corruption” is like the catch all for unknown software errors. If you take action based on wild assumptions, bad things are going to happen. And let me tell you they continue to happen even though the new boss Bill seems to have a calm logical approach to things.
I wonder if this book was written like Dilbert, where the author was simply writing about what really happened to him. Its the only way this could be so close to accurate.
About halfway through the book, I had a guess that 3 of the secondary characters that were helping Bill, especially Erik, may have just been his alternate personalities. Wes is the aggressive obnoxious one, Patty is the over documenter and process type, and Erik is philosophical one. I was actually disappointed that they remained real characters and not imaginary. I think it would have added to the story to find out that Bill had really just been going crazy from all the stress.
I loved watching the team be shocked at how many changes actually happen in the ops world that they have been living in. How could they not know? Changes are like queries on a database, sometimes it makes sense to count them, but mostly they are so different that they can’t be counted. One single big change can be more impactful and riskier that 1000 small changes combined.
Who changed what, when? Questions all ops teams should be able to answer. The book describes “changes” as one of four types of work. I’m not really certain how it fits into DevOps. Maybe change control is about reducing unplanned work, which is another type of work.
I liked the compromise they made between using the crappy change control system, but still forcing and encouraging teams to communicate by writing them on cards. It started a habit and the process communicated the vision. It was an early win in their struggles. The system had so many side benefits such as discovering the Brent bottleneck.
I wouldn’t encourage IT departments to use an index card method to schedule changes. Its not searchable and doesn’t scale well. A heavy handed software application with too many required fields is not the best approach either. The key is having clear definitions of what “Change” really means and what systems need to be tracked the most. IE: important financial systems such as payroll.
This concept hit close to home. My team has lost two people in the last few months and the workload is climbing to unprecedented levels. The automation I’ve put in place is in need of upgrades and important business projects are coming to fruition.
When you are busy, you make mistakes. When you make mistakes, its time consuming to recover. You also take shortcuts that tend to create more work in the long run. Being busy sucks the life out of people.
Decreasing the wait time for IT to add value to the business is was DevOps is all about. The book illustrates this quite well across several fronts. The way Bill achieves some of his goals before achieving kumbaya in the datacenter is with endless hours. He gets denied more people so he takes his salaried workforce and makes something out of nothing.
The graph describes why wait times go through the roof. People can function quite well until they are over 90% busy, from there wait times go through the roof. You can’t squeeze 11% of output out of 10% of idle time. It creates context switching penalties and queuing. This drives the wait times through the roof.
This is why I sometimes work long hours. I know that if I fall behind, it piles up like laundry and I have no clean underwear. It didn’t quite click until I saw the graph in this book but it make total sense. Trying to squeeze that last little bit of production out of a person or process can lead to devastating results.
In the book, Bill realizes he needs to dedicate Brent to project Phoenix. I like the pool of people dedicated to dealing with escalations that usually go to Brent. Its like training without the training. Allowing Brent to focus leads to some interesting automation discoveries later in the book.
Everything is Awesome!
After the first 10 chapters, the book slows down its pace quite a bit. Some characters turn a 180 and everything starts going better. It was a little harder to read and the politics started to take over.
The authors started to apply DevOps approaches to a small team and everything just magically worked. I was hoping there would be continuing issues before they actually got things right but magic pixie dust just made things work. Brent’s server builds just converted over to the cloud without mention of problems or massive costs increases that they already sunk into onsite servers not to mention the architectual shift that would have had to take place to successfully run in the old code in the cloud. But I suppose they were close to 10 deployments a day so it would have been fast right?