DevOpsDays London is over and, as usual, there were some great sessions. Exposing fragility in systems was once again a topic that was discussed.
If you follow the DevOps movement at all, you will no doubt have heard of chaos monkey at Netflix numerous times. For those who have not heard about it, chaos monkey runs in all environments, including production environments, killing processes in order to test how the environment responds to failure. This has led to a loosely coupled, fault tolerant system.
While most companies aren’t able to inject failures on a scale that Netflix does, that doesn’t mean that others aren’t embracing this approach.
There was a presentation on how Pager Duty has “Failure Friday,” where they introduce failures into their system. This allows them to test not only how resilient their software is but also how good their monitoring, logging and processes are.
Are you able to inject failure into your systems and respond to them without your application going down in flames? Whatever the answer is, it might be a good idea to have your own Failure Friday and learn how to cope with failure.