Resiliency in distributed systems

Click for: original source

Rajeev Bharshetty neat article about why we need consider resiliency in software development. Resiliency of a system is directly proportional to its up-time and availability. The more resilient the systems, the more available it is to serve users.

Resiliency is all about preventing faults turning into failures. The most resilient piece of code you ever write will be the code you never wrote.

The article then helps you with:

  • Basic terminology for resilient systems
  • Explains why do we care about resiliency in our systems
  • Explains why resiliency in distributed systems is hard
  • Introduces you to basic patterns

But remember, despite the best effort, systems do fail. The sad truth is we have to deal with these failures. To be resilient, we have to design our systems for failure.

Good read with many charts and diagrams supporting explanations.

[Read More]

Tags event-driven software-architecture programming