Strategies for Learning from Failure
The wisdom of learning from failure is incontrovertible. Yet organizations that do it well are extraordinarily rare. This gap is not due to a lack of commitment to learning. Managers in the vast majority of enterprises that I have studied over the past 20 years—pharmaceutical, financial services, product design, telecommunications, and construction companies; hospitals; and NASA’s space shuttle program, among others—genuinely wanted to help their organizations learn from failures to improve future performance. In some cases they and their teams had devoted many hours to after-action reviews, postmortems, and the like. But time after time I saw that these painstaking efforts led to no real change. The reason: Those managers were thinking about failure the wrong way.
Most executives I’ve talked to believe that failure is bad (of course!). They also believe that learning from it is pretty straightforward: Ask people to reflect on what they did wrong and exhort them to avoid similar mistakes in the future—or, better yet, assign a team to review and write a report on what happened and then distribute it throughout the organization.
These widely held beliefs are misguided. First, failure is not always bad. In organizational life it is sometimes bad, sometimes inevitable, and sometimes even good. Second, learning from organizational failures is anything but straightforward. The attitudes and activities required to effectively detect and analyze failures are in short supply in most companies, and the need for context-specific learning strategies is underappreciated. Organizations need new and better ways to go beyond lessons that are superficial (“Procedures weren’t followed”) or self-serving (“The market just wasn’t ready for our great new product”). That means jettisoning old cultural beliefs and stereotypical notions of success and embracing failure’s lessons. Leaders can begin by understanding how the blame game gets in the way.
The Blame Game
Failure and fault are virtually inseparable in most households, organizations, and cultures. Every child learns at some point that admitting failure means taking the blame. That is why so few organizations have shifted to a culture of psychological safety in which the rewards of learning from failure can be fully realized.
Executives I’ve interviewed in organizations as different as hospitals and investment banks admit to being torn: How can they respond constructively to failures without giving rise to an anything-goes attitude? If people aren’t blamed for failures, what will ensure that they try as hard as possible to do their best work?
This concern is based on a false dichotomy. In actuality, a culture that makes it safe to admit and report on failure can—and in some organizational contexts must—coexist with high standards for performance. To understand why, look at the exhibit “A Spectrum of Reasons for Failure,” which lists causes ranging from deliberate deviation to thoughtful experimentation.
Which of these causes involve blameworthy actions? Deliberate deviance, first on the list, obviously warrants blame. But inattention might not. If it results from a lack of effort, perhaps it’s blameworthy. But if it results from fatigue near the end of an overly long shift, the manager who assigned the shift is more at fault than the employee. As we go down the list, it gets more and more difficult to find blameworthy acts. In fact, a failure resulting from thoughtful experimentation that generates valuable information may actually be praiseworthy.
When I ask executives to consider this spectrum and then to estimate how many of the failures in their organizations are truly blameworthy, their answers are usually in single digits—perhaps 2% to 5%. But when I ask how many are treated as blameworthy, they say (after a pause or a laugh) 70% to 90%. The unfortunate consequence is that many failures go unreported and their lessons are lost.
Not All Failures Are Created Equal
A sophisticated understanding of failure’s causes and contexts will help to avoid the blame game and institute an effective strategy for learning from failure. Although an infinite number of things can go wrong in organizations, mistakes fall into three broad categories: preventable, complexity-related, and intelligent.
Preventable failures in predictable operations.
Most failures in this category can indeed be considered “bad.” They usually involve deviations from spec in the closely defined processes of high-volume or routine operations in manufacturing and services. With proper training and support, employees can follow those processes consistently. When they don’t, deviance, inattention, or lack of ability is usually the reason. But in such cases, the causes can be readily identified and solutions developed. Checklists (as in the Harvard surgeon Atul Gawande’s recent best seller The Checklist Manifesto) are one solution. Another is the vaunted Toyota Production System, which builds continual learning from tiny failures (small process deviations) into its approach to improvement. As most students of operations know well, a team member on a Toyota assembly line who spots a problem or even a potential problem is encouraged to pull a rope called the andon cord, which immediately initiates a diagnostic and problem-solving process. Production continues unimpeded if the problem can be remedied in less than a minute. Otherwise, production is halted—despite the loss of revenue entailed—until the failure is understood and resolved.
Unavoidable failures in complex systems.
A large number of organizational failures are due to the inherent uncertainty of work: A particular combination of needs, people, and problems may have never occurred before. Triaging patients in a hospital emergency room, responding to enemy actions on the battlefield, and running a fast-growing start-up all occur in unpredictable situations. And in complex organizations like aircraft carriers and nuclear power plants, system failure is a perpetual risk.
Although serious failures can be averted by following best practices for safety and risk management, including a thorough analysis of any such events that do occur, small process failures are inevitable. To consider them bad is not just a misunderstanding of how complex systems work; it is counterproductive. Avoiding consequential failures means rapidly identifying and correcting small failures. Most accidents in hospitals result from a series of small failures that went unnoticed and unfortunately lined up in just the wrong way.
Intelligent failures at the frontier.
Failures in this category can rightly be considered “good,” because they provide valuable new knowledge that can help an organization leap ahead of the competition and ensure its future growth—which is why the Duke University professor of management Sim Sitkin calls them intelligent failures. They occur when experimentation is necessary: when answers are not knowable in advance because this exact situation hasn’t been encountered before and perhaps never will be again. Discovering new drugs, creating a radically new business, designing an innovative product, and testing customer reactions in a brand-new market are tasks that require intelligent failures. “Trial and error” is a common term for the kind of experimentation needed in these settings, but it is a misnomer, because “error” implies that there was a “right” outcome in the first place. At the frontier, the right kind of experimentation produces good failures quickly. Managers who practice it can avoid the unintelligent failure of conducting experiments at a larger scale than necessary.
Leaders of the product design firm IDEO understood this when they launched a new innovation-strategy service. Rather than help clients design new products within their existing lines—a process IDEO had all but perfected—the service would help them create new lines that would take them in novel strategic directions. Knowing that it hadn’t yet figured out how to deliver the service effectively, the company started a small project with a mattress company and didn’t publicly announce the launch of a new business.
Although the project failed—the client did not change its product strategy—IDEO learned from it and figured out what had to be done differently. For instance, it hired team members with MBAs who could better help clients create new businesses and made some of the clients’ managers part of the team. Today strategic innovation services account for more than a third of IDEO’s revenues.
Tolerating unavoidable process failures in complex systems and intelligent failures at the frontiers of knowledge won’t promote mediocrity. Indeed, tolerance is essential for any organization that wishes to extract the knowledge such failures provide. But failure is still inherently emotionally charged; getting an organization to accept it takes leadership.
Building a Learning Culture
Only leaders can create and reinforce a culture that counteracts the blame game and makes people feel both comfortable with and responsible for surfacing and learning from failures. (See the sidebar “How Leaders Can Build a Psychologically Safe Environment.”) They should insist that their organizations develop a clear understanding of what happened—not of “who did it”—when things go wrong. This requires consistently reporting failures, small and large; systematically analyzing them; and proactively searching for opportunities to experiment.
How Leaders Can Build a Psychologically Safe Environment
If an organization’s employees are to help spot existing and pending …
Leaders should also send the right message about the nature of the work, such as reminding people in R&D, “We’re in the discovery business, and the faster we fail, the faster we’ll succeed.” I have found that managers often don’t understand or appreciate this subtle but crucial point. They also may approach failure in a way that is inappropriate for the context. For example, statistical process control, which uses data analysis to assess unwarranted variances, is not good for catching and correcting random invisible glitches such as software bugs. Nor does it help in the development of creative new products. Conversely, though great scientists intuitively adhere to IDEO’s slogan, “Fail often in order to succeed sooner,” it would hardly promote success in a manufacturing plant.
The slogan “Fail often in order to succeed sooner” would hardly promote success in a manufacturing plant.
Often one context or one kind of work dominates the culture of an enterprise and shapes how it treats failure. For instance, automotive companies, with their predictable, high-volume operations, understandably tend to view failure as something that can and should be prevented. But most organizations engage in all three kinds of work discussed above—routine, complex, and frontier. Leaders must ensure that the right approach to learning from failure is applied in each. All organizations learn from failure through three essential activities: detection, analysis, and experimentation.
Detecting Failure
Spotting big, painful, expensive failures is easy. But in many organizations any failure that can be hidden is hidden as long as it’s unlikely to cause immediate or obvious harm. The goal should be to surface it early, before it has mushroomed into disaster.
Shortly after arriving from Boeing to take the reins at Ford, in September 2006, Alan Mulally instituted a new system for detecting failures. He asked managers to color code their reports green for good, yellow for caution, or red for problems—a common management technique. According to a 2009 story in Fortune, at his first few meetings all the managers coded their operations green, to Mulally’s frustration. Reminding them that the company had lost several billion dollars the previous year, he asked straight out, “Isn’t anything not going well?” After one tentative yellow report was made about a serious product defect that would probably delay a launch, Mulally responded to the deathly silence that ensued with applause. After that, the weekly staff meetings were full of color.
That story illustrates a pervasive and fundamental problem: Although many methods of surfacing current and pending failures exist, they are grossly underutilized. Total Quality Management and soliciting feedback from customers are well-known techniques for bringing to light failures in routine operations. High-reliability-organization (HRO) practices help prevent catastrophic failures in complex systems like nuclear power plants through early detection. Electricité de France, which operates 58 nuclear power plants, has been an exemplar in this area: It goes beyond regulatory requirements and religiously tracks each plant for anything even slightly out of the ordinary, immediately investigates whatever turns up, and informs all its other plants of any anomalies.
Such methods are not more widely employed because all too many messengers—even the most senior executives—remain reluctant to convey bad news to bosses and colleagues. One senior executive I know in a large consumer products company had grave reservations about a takeover that was already in the works when he joined the management team. But, overly conscious of his newcomer status, he was silent during discussions in which all the other executives seemed enthusiastic about the plan. Many months later, when the takeover had clearly failed, the team gathered to review what had happened. Aided by a consultant, each executive considered what he or she might have done to contribute to the failure. The newcomer, openly apologetic about his past silence, explained that others’ enthusiasm had made him unwilling to be “the skunk at the picnic.”
In researching errors and other failures in hospitals, I discovered substantial differences across patient-care units in nurses’ willingness to speak up about them. It turned out that the behavior of midlevel managers—how they responded to failures and whether they encouraged open discussion of them, welcomed questions, and displayed humility and curiosity—was the cause. I have seen the same pattern in a wide range of organizations.
A horrific case in point, which I studied for more than two years, is the 2003 explosion of the Columbia space shuttle, which killed seven astronauts (see “Facing Ambiguous Threats,” by Michael A. Roberto, Richard M.J. Bohmer, and Amy C. Edmondson, HBR November 2006). NASA managers spent some two weeks downplaying the seriousness of a piece of foam’s having broken off the left side of the shuttle at launch. They rejected engineers’ requests to resolve the ambiguity (which could have been done by having a satellite photograph the shuttle or asking the astronauts to conduct a space walk to inspect the area in question), and the major failure went largely undetected until its fatal consequences 16 days later. Ironically, a shared but unsubstantiated belief among program managers that there was little they could do contributed to their inability to detect the failure. Postevent analyses suggested that they might indeed have taken fruitful action. But clearly leaders hadn’t established the necessary culture, systems, and procedures.
One challenge is teaching people in an organization when to declare defeat in an experimental course of action. The human tendency to hope for the best and try to avoid failure at all costs gets in the way, and organizational hierarchies exacerbate it. As a result, failing R&D projects are often kept going much longer than is scientifically rational or economically prudent. We throw good money after bad, praying that we’ll pull a rabbit out of a hat. Intuition may tell engineers or scientists that a project has fatal flaws, but the formal decision to call it a failure may be delayed for months.
Again, the remedy—which does not necessarily involve much time and expense—is to reduce the stigma of failure. Eli Lilly has done this since the early 1990s by holding “failure parties” to honor intelligent, high-quality scientific experiments that fail to achieve the desired results. The parties don’t cost much, and redeploying valuable resources—particularly scientists—to new projects earlier rather than later can save hundreds of thousands of dollars, not to mention kickstart potential new discoveries.
Analyzing Failure
Once a failure has been detected, it’s essential to go beyond the obvious and superficial reasons for it to understand the root causes. This requires the discipline—better yet, the enthusiasm—to use sophisticated analysis to ensure that the right lessons are learned and the right remedies are employed. The job of leaders is to see that their organizations don’t just move on after a failure but stop to dig in and discover the wisdom contained in it.
Why is failure analysis often shortchanged? Because examining our failures in depth is emotionally unpleasant and can chip away at our self-esteem. Left to our own devices, most of us will speed through or avoid failure analysis altogether. Another reason is that analyzing organizational failures requires inquiry and openness, patience, and a tolerance for causal ambiguity. Yet managers typically admire and are rewarded for decisiveness, efficiency, and action—not thoughtful reflection. That is why the right culture is so important.
The challenge is more than emotional; it’s cognitive, too. Even without meaning to, we all favor evidence that supports our existing beliefs rather than alternative explanations. We also tend to downplay our responsibility and place undue blame on external or situational factors when we fail, only to do the reverse when assessing the failures of others—a psychological trap known as fundamental attribution error.
My research has shown that failure analysis is often limited and ineffective—even in complex organizations like hospitals, where human lives are at stake. Few hospitals systematically analyze medical errors or process flaws in order to capture failure’s lessons. Recent research in North Carolina hospitals, published in November 2010 in the New England Journal of Medicine, found that despite a dozen years of heightened awareness that medical errors result in thousands of deaths each year, hospitals have not become safer.
Fortunately, there are shining exceptions to this pattern, which continue to provide hope that organizational learning is possible. At Intermountain Healthcare, a system of 23 hospitals that serves Utah and southeastern Idaho, physicians’ deviations from medical protocols are routinely analyzed for opportunities to improve the protocols. Allowing deviations and sharing the data on whether they actually produce a better outcome encourages physicians to buy into this program. (See “Fixing Health Care on the Front Lines,” by Richard M.J. Bohmer, HBR April 2010.)
Motivating people to go beyond first-order reasons (procedures weren’t followed) to understanding the second- and third-order reasons can be a major challenge. One way to do this is to use interdisciplinary teams with diverse skills and perspectives. Complex failures in particular are the result of multiple events that occurred in different departments or disciplines or at different levels of the organization. Understanding what happened and how to prevent it from happening again requires detailed, team-based discussion and analysis.
A team of leading physicists, engineers, aviation experts, naval leaders, and even astronauts devoted months to an analysis of the Columbia disaster. They conclusively established not only the first-order cause—a piece of foam had hit the shuttle’s leading edge during launch—but also second-order causes: A rigid hierarchy and schedule-obsessed culture at NASA made it especially difficult for engineers to speak up about anything but the most rock-solid concerns.
Promoting Experimentation
The third critical activity for effective learning is strategically producing failures—in the right places, at the right times—through systematic experimentation. Researchers in basic science know that although the experiments they conduct will occasionally result in a spectacular success, a large percentage of them (70% or higher in some fields) will fail. How do these people get out of bed in the morning? First, they know that failure is not optional in their work; it’s part of being at the leading edge of scientific discovery. Second, far more than most of us, they understand that every failure conveys valuable information, and they’re eager to get it before the competition does.
In contrast, managers in charge of piloting a new product or service—a classic example of experimentation in business—typically do whatever they can to make sure that the pilot is perfect right out of the starting gate. Ironically, this hunger to succeed can later inhibit the success of the official launch. Too often, managers in charge of pilots design optimal conditions rather than representative ones. Thus the pilot doesn’t produce knowledge about what won’t work.
Too often, pilots are conducted under optimal conditions rather than representative ones. Thus they can’t show what won’t work.
In the very early days of DSL, a major telecommunications company I’ll call Telco did a full-scale launch of that high-speed technology to consumer households in a major urban market. It was an unmitigated customer-service disaster. The company missed 75% of its commitments and found itself confronted with a staggering 12,000 late orders. Customers were frustrated and upset, and service reps couldn’t even begin to answer all their calls. Employee morale suffered. How could this happen to a leading company with high satisfaction ratings and a brand that had long stood for excellence?
A small and extremely successful suburban pilot had lulled Telco executives into a misguided confidence. The problem was that the pilot did not resemble real service conditions: It was staffed with unusually personable, expert service reps and took place in a community of educated, tech-savvy customers. But DSL was a brand-new technology and, unlike traditional telephony, had to interface with customers’ highly variable home computers and technical skills. This added complexity and unpredictability to the service-delivery challenge in ways that Telco had not fully appreciated before the launch.
A more useful pilot at Telco would have tested the technology with limited support, unsophisticated customers, and old computers. It would have been designed to discover everything that could go wrong—instead of proving that under the best of conditions everything would go right. (See the sidebar “Designing Successful Failures.”) Of course, the managers in charge would have to have understood that they were going to be rewarded not for success but, rather, for producing intelligent failures as quickly as possible.
Designing Successful Failures
Perhaps unsurprisingly, pilot projects are usually designed to succeed …
In short, exceptional organizations are those that go beyond detecting and analyzing failures and try to generate intelligent ones for the express purpose of learning and innovating. It’s not that managers in these organizations enjoy failure. But they recognize it as a necessary by-product of experimentation. They also realize that they don’t have to do dramatic experiments with large budgets. Often a small pilot, a dry run of a new technique, or a simulation will suffice.
The courage to confront our own and others’ imperfections is crucial to solving the apparent contradiction of wanting neither to discourage the reporting of problems nor to create an environment in which anything goes. This means that managers must ask employees to be brave and speak up—and must not respond by expressing anger or strong disapproval of what may at first appear to be incompetence. More often than we realize, complex systems are at work behind organizational failures, and their lessons and improvement opportunities are lost when conversation is stifled.
Savvy managers understand the risks of unbridled toughness. They know that their ability to find out about and help resolve problems depends on their ability to learn about them. But most managers I’ve encountered in my research, teaching, and consulting work are far more sensitive to a different risk—that an understanding response to failures will simply create a lax work environment in which mistakes multiply.
This common worry should be replaced by a new paradigm—one that recognizes the inevitability of failure in today’s complex work organizations. Those that catch, correct, and learn from failure before others do will succeed. Those that wallow in the blame game will not.