The Many Definitions of Resilience

According to the Oxford English Dictionary, resilience is a noun, with the primary meanings of

  1. The capacity to recover quickly from difficulties; toughness.

  2. The ability of a substance or object to spring back into shape; elasticity.

Resilience is frequently mentioned in software, especially in the context of the successful operations of distributed systems. Organizations like Lyft and Twilio have resilience teams —that is, teams named, for example “Server Resilience”— made of up software engineers who deliver “testing capabilities that allow teams to successfully test their software in a multitude of ways - from load testing to failure injection” or “define, build, and deploy systems that enable non-stop operations” like “Real-time dynamic network tuning."

I'm not aware of a precise and consensus definition of resilience in a software context, but the general “capacity to recover quickly from difficulties” fits reasonably well (Aaron Blohowiak has a great article on the value of recovery).

In safety science, Resilience Engineering is a discipline that can be traced back to David Woods' 2003 Congressional testimony after the Space Shuttle Columbia disaster; my understanding is that Erik Hollnagel is considered a cofounder. Resilience engineering is covered in detail by Sidney Dekker in his Foundations of Safety Science textbook, and today the Resilience Engineering Association is a "global community of practice".

I‘ve found resilience engineering to be a deep and interesting field, but not one I want to attempt to summarize in a blog post (although I’m happy to provide pointers to reading lists or discuss further, just ping me on twitter, or check out! When it comes to the simpler question of definitions, Woods has helpfully already done the work of defining resilience in the context of resilience engineering. In his paper Four concepts for resilience and the implications for the future of resilience engineering, those definitions are given as:

  1. resilience as rebound from trauma and return to equilibrium

  2. resilience as a synonym for robustness

  3. resilience as the opposite of brittleness, i.e., as graceful extensibility when surprise challenges boundaries

  4. resilience as network architectures that can sustain the ability to adapt to future surprises as conditions evolve

Woods' first definition is roughly equivalent to the OED, and the second is uninteresting for our purposes. In the course of the paper Woods demonstrates the ways in which the first two definitions are insufficient safety in complex systems; for example rebound doesn't address the surprise and model error involved in events that cause failure and trigger rebound (which resilience engineering considers very important). The third and fourth, then, are the definitions which resilience-engineering-the-safety-science-discipline concerns itself with. Finally, while “resilience as ability to recovery/rebound” is a noun, “resilience as sustained ability to adapt to future surprises” is a verb (Woods has a paper on this as well!). And while we can program software to rebound (e.g., with Envoy Proxy circuit breaking), only humans can engage in practices which provide sustained adaptive capacity to socio-technical systems.

So, in summary, “resilience as recovery”/"resilience as plain English" is a noun which can apply to software or people. "Resilience as sustained adaptive capacity"/“resilience as a safety scientist” is a verb which can only apply to people.

Many other concepts in safety science—consider Normal Accident Theory, or High Reliability Organizations—have unique names which don't suffer this sort of collision and confusion. Today, every time I see a Twitter thread debating which definition of resilience is appropriate (and I'm sure I'm guilty of starting some), I wonder whether Woods and Hollnagel could have headed this all off at the pass in the early oughts by choosing a different name for their new discipline...

To reply you need to sign in.