When dealing with failures in a microservice system, localized mitigation mechanisms like load shedding and circuit breakers have always been used, but they may not be as effective as a more globalized approach.
Tag Archives: reliability
Eliminating Task Processing Outages by Replacing RabbitMQ with Apache Kafka Without Downtime
Scaling backend infrastructure to handle hyper-growth is one of the many exciting challenges of working at DoorDash.
Enforce Timeout: A DoorDash Reliability Methodology
“What would happen if we removed statement timeouts in our Postgresql databases?” That’s one of the questions asked in a management meeting.