Earlier this week a woman was hit and killed by an autonomous car in Tempe, Arizona while she was crossing the road with her bike.
The accident is very sad and my heart goes out to the victim’s family and friends.
If you read my recent piece on the anatomy of disasters, you’ll recognize several of the common features here – although on a smaller scale.
The pedestrian was crossing a 5-lane, 45 MPH street in an area where drivers wouldn’t normally expect pedestrians. The autonomous car, operated by Uber, obviously failed to detect the pedestrian and stop in time. The “safety driver” wasn’t focused on the road or prepared to stop the vehicle.
And it didn’t help that it was very dark outside, this section of the road was unlit, and the pedestrian had no lighting or reflectors to make herself seen.
I’ve seen the video of the accident and it’s terrible. Unfortunately, I think that even an experienced driver would have hit the woman too.
Elaine Herzberg, a 49 year old woman, was hit and killed by an autonomous car operated by Uber in Tempe, Arizona. She was crossing the road with her bike late Sunday night, March 18, 2018. I wrote about the accident in more detail here but I wanted to separate the video of the accident from my post so that people could choose whether they wanted to watch it or not.
Why don’t we learn from past experiences when it comes to planning new projects? Why aren’t even our best laid plans realistic?
Surely you’ve noticed this – whether it’s getting your taxes done, that big presentation for work, or planning your wedding.
Why do 80-90% of mega projects run over budget and over schedule?
Why has it taken – for example – nearly 100 years to expand the Second Avenue Subway in NYC? The original project was expected to cost 1.4 billion dollars (a 1929 estimate in 2017 dollars) and now with Phase 1 completed ($4.5 billion to build just 3 of the 16 proposed stations), Phase 2 is expected to cost $6 billion.
Why do we fall for The Planning Fallacy again and again?
When planning a project we naturally focus on the case at hand, building a simulation in our minds. But our simulations are rosy, idealized, and don’t account for all of the complexities that will inevitably unfold.
We also focus on succeeding, not failing, creating an optimism bias. This means we don’t think enough about all the things that can go wrong.
We’re overly confident, believing in our abilities and the old “this time will be different“ line too much.
We ignore the complexity of integrating all of the parts of a project together.
We intentionally misrepresent a project’s plans in order to get it approved.
We rely too heavily on our subjective judgement instead of the facts and past empirical data.
And of course: incompetence, fraud, deliberate deception, cheating, stealing, and politicking.
Use past projects – even if they’re not exactly comparable – as a benchmark for projects being planned.
Track and score the difference between forecasts and outcomes.
Get stakeholders to put skin in the game, creating rewards and penalties for good and bad performance. #IncentivesMatter.
Use data and algorithms to reduce human biases.
Use good tools to help you focus. Asana co-founder Justin Rosenstein warns against “continuous partial attention” – a state of never fully focusing on any one thing.
Success Building Software
I build projects for a living – mostly product strategy and software for start-ups or innovation groups within larger companies. I plan and execute on projects everyday and I still struggle with the planning fallacy in other areas of my business (did I mention my corporate taxes are due in 7 days?).
But the secret sauce to my successes building products has always been to 1) have personal expertise in what’s being planned and built, 2) refine and go over the plans until your eye bleed looking for possible pitfalls, and 3) have a clear and easy-to-follow process to keep you focused on the right thing at the right time.
Terms & Concepts
The Planning Fallacy – Poorly estimating the timeline, quality, and budget of a planned project while knowing that similar projects have taken longer, cost more, or had sub-par results.
The Optimism Bias – Focusing on the positives of a situation over the negatives.
Overconfidence – Thinking that we’ll perform better than we actually will.
Coordination Neglect – Failing to account for how difficult it is to coordinate efforts and combine all of the individual outputs into one complete system.
Procrastination – Choosing to do things that we enjoy in the short term instead of the things we think will make us better further down the road. In the episode, Katherine Milkman called procrastination a “self-control failure” – my new favorite phrase.
Reference Class Forecasting – Using past and similar projects as a benchmark for how your next project will perform.
Strategic Misrepresentation – Underestimating the costs and over representing the benefits of a project.
Algorithm Aversion – The big thing that Katy Milkman thinks is holding us back from using “data instead of human judgement to make forecasts” better.
When I first started working at the nation’s largest refinery my boss didn’t have any great projects ready for me so he sent me to “Shift Super” training. There were only 4 of us students – me and 3 unit operators, each with at least 20 years of experience. I was only 19 years old and I barely knew anything about anything.
Each shift supervisor runs a big chunk of the refinery: 2-4 major units. But shift supers needed to know how to supervise any of the 10 or so control centers safely, so on my first day of work I was learning how the entire refinery operated. It felt like a lifetime’s education crammed into 5 days.
After Shift Super training, I began to see more and more of the refinery with my own eyes. What had been a precise line drawn from one perfect cylinder to another perfect cylinder was actually a rust covered 6-inch pipe baking in the Texas heat transporting high-octane, extremely flammable raffinate from a 170 ft separation tower to a temporary holding tank.
Fear of Disaster
One evening – about 2 weeks in – as the refinery was becoming a real place to me, I had this moment of pure panic while driving home.
With so many things that could go wrong at any moment, how was the refinery still standing? How had it even made it until now? Any minor mistake – in the design, production, construction, or operation of any pipe or vessel – could result in a huge disaster. Would the refinery even be there when I returned tomorrow morning?
I couldn’t sleep. The next morning the refinery was still there. And the next. And the next. And my fears slowly morphed into amazement. My refinery was the largest, most complicated system I had ever attempted to wrap my brain around.
BP Texas City Refinery Explosion
But just 29 miles away from my refinery, investigators were trying to piece together what happened during the nation’s worst industrial disaster in nearly 2 decades.
Fifteen people had been killed and 180 injured – dozens were very seriously injured. The cause of each fatality was the same: blunt force trauma.
Windows on houses and businesses were shattered three-quarters of a mile away and 43,000 people were asked to remain indoors while fires burned for hours. 200,000 square feet of the refinery was scorched. Units, tanks, pipes were destroyed and the total financial loss was over $1.5 billion.
The BP Texas City Refinery Explosion was a classic disaster. A series of engineering and communication mistakes led to a 170-foot separation tower in the ISOM unit being overfilled with tens of thousands of gallons of extremely flammable liquid raffinate – the component of gasoline that really gives it a kick. The unit was designed to operate with about 6,000 gallons of liquid raffinate so once the vessel was completely filled, 52,000 gallons of 200 °F raffinate rushed through various attached vapor systems. Hot raffinate spewed into the air in a 20 foot geyser. With a truck idling nearby, an explosion was immanent.
This video from the US Chemical Safety Board (CSB) is easy to consume and well done. The 9 minutes starting at 3:21 explain the Texas City incident in detail:
I’ve studied this and other disasters in detail because in order to prevent disasters we have to understand their anatomy.
The trigger is the most direct cause of a disaster and is usually pretty easy to identify. The spark that ignited the explosion. The iceberg that ruptured the ship’s hull. The levee breaches that flooded 80% of New Orleans.
But the trigger typically only tells a small part of the story and it usually generates more questions than answers: Why was there a spark? Why was highly flammable raffinate spewing everywhere? Why was there so much? What brought these explosive ingredients together after so many people had worked so hard to prevent situations exactly like this?
While the trigger is a critical piece of the puzzle, a thorough analysis of a disaster has to look at the bigger picture.
When The Stars Align
The word disaster describes rapidly occurring damage or destruction of natural or man-made origins. But the word disaster has its roots in the Italian word disastro, meaning “ill-starred” (dis + astro). The sentiment is that the positioning of the stars and planets is astrologically unfavorable.
One of the things I learned from pouring over the incident reports of the Texas City Explosion was that disasters tend to only happen when at least 3 or 4 mistakes are made back to back or simultaneously – when the stars align.
Complex systems typically account for obvious mistakes. But they less frequently account for several mistakes occurring simultaneously. The stars certainly aligned in the case of the Texas City Refinery Explosion:
Employees and contractors were located in fragile wooden portable trailers near dangerous units that were about to start up.
The start-up process for the ISOM unit began just after 2 AM, when workers were tired and conditions were not ideal.
The start-up was done over an 11 hour period meaning that the procedure spanned a shift change – creating many opportunities for miscommunication. Unfortunately, the start-up could have easily been done during a single shift.
At least one operator had worked 30 back to back 12 hour days because of the various turnaround activities at the refinery and BP’s cost-cutting measures.
One liquid level indicator on the vessel that was being filled was only designed to work between a certain narrow range.
Once the unit was filled above the indicator’s upper range, the indicator reports incorrect values near the upper range, misleading operators regarding the true conditions of the liquid level. (ie: at one point the level indicator would report that the liquid levels in the tower were only at 7.9 feet when they were actually over 150 ft)
A backup high level alarm located above the level indicator failed to go off.
The lead operator left the refinery an hour before his shift ended.
Operators did not leave adequate or clear logs for one another meaning that knowledge failed to transfer between key players.
The day shift supervisor arrived an hour late for his shift and therefore missed any opportunity for direct knowledge transfer.
Start-up procedures were not followed and the tower was intentionally filled above the prescribed start-up level because doing so made subsequent steps more convenient for operators.
The valve to let fluids out of the tower was not opened at the correct time even though the unit continued to be filled.
The day shift supervisor left the refinery due to a family emergency and no qualified supervisor was present for the remainder of the unfolding disaster. A single operator was now operating all 3 units in a remote control center, including the ISOM unit that needed special attention during start-up.
Operators tried various things to reduce the pressure at the top of the tower without understanding the circumstances in the tower. One of the things they tried – opening the valve that moved liquids from the bottom of the tower into storage tanks (a step that they had failed to do hours earlier) – caused very hot liquid from the tower to pass through a heat exchanger with the fluid entering the tower. This caused the temperature of the fluid entering the tower to spike, exacerbating the problems even further.
The tower, which was never designed to be filled with more than a few feet of liquid, had now been filled to the very top – 170 feet. With no other place to go, liquid rushed into the vapor systems at the top of the tower.
At this point, no one knew that the tower had been overfilled with boiling raffinate. The liquid level indicator read that the unit was only filled to 7.9 feet.
Over the next 6 minutes, 52,000 gallons of nearly boiling, extremely flammable raffinate rushed out of the top of the unit and into adjacent systems – systems that were only designed to handle vapors, not liquids.
Thousands of gallons of raffinate entered an antiquated system that vents hydrocarbon vapors directly to the atmosphere – called a blowdown drum.
A final alarm – the high level alarm on the blowdown drum – failed to go off. But it was too late. Disaster was already immanent.
Raffinate spewed from the top of the blowdown drum. The geyser was 3 feet wide and 20 feet tall. The hot raffinate instantly began to vaporize, turning into a huge flammable cloud.
A truck, idling nearby, was the ignition source.
The portable trailers were destroyed instantly by the blast wave and most of the people inside were killed. Fires raged for hours, delaying rescue efforts.
Man-made disasters don’t just happen in complex systems. The stars have to align. But the quality of the mistakes matters a lot. Had even one key error above been avoided or caught, this incident wouldn’t have happened. In this case, overfilling the unit by 150,000 gallons of nearly boiling flammable raffinate, set off a chain of events that guaranteed disaster.
The Snowball Effect
Not all mistakes are made equally. Several of the errors in the Texas City Refinery Explosion compounded: Had operators followed the start-up procedure and not filled the tower beyond the designed level, had the tower been better designed to communicate liquid levels over a broader range, had the valve draining the tower been opened at the correct time, had the operators communicated properly between shifts… Had any one of these mistakes been avoided, the tower wouldn’t have been over filled and this disaster would have been prevented.
Miscommunication errors seem to have a special way of compounding and spiraling out of control.
While preventing some of the other mistakes might have mitigated the damage done, failing to understand the quantity of raffinate in the tower ultimately caused the disaster at Texas City.
Think about the complex systems you care about in your business and life. List the raw ingredients for a disaster. What information do decision makers and operators need in order to react appropriately?
Identify the singular points of failure and the obvious triggers. Brainstorm scenarios – both common and uncommon – in order to better understand how different mistakes could interact with one another and how they could snowball out of control.
Pay attention to both the system’s design and the human errors – especially communication errors – that will inevitably arise during normal operation. Think about how you can design the system to be more resilient without sacrificing too much efficiency. What brakes can you build into the process to slow down snowballs?
Where do you need warning alarms? What are the right set-points for each alarm? How vocal do the alarms need to be? What happens when the alarms fail? How often will you test or double check your alarms?
Disasters tend to happen within large and complex systems. Usually, the immediate cause of a disaster – the literal or figurative spark or trigger – can be readily identified. But there’s almost always a bigger picture, a series of mistakes and errors that led to that spark or gave that spark power. Some of those mistakes set off bigger and bigger problems, which can snowball into something truly catastrophic.
Bottom Line: Understanding the anatomy of a disaster in your world is the first step to designing better systems, procedures, and training to help mitigate damage or prevent disasters altogether.