Skip to content

Improving Fault Tolerance with RPC Fallbacks in DoorDash’s Microservices

Failures in a large, complex microservice architecture are inevitable, so built-in fault tolerance — retries, replication, and fallbacks — are a critical part of preventing system-wide outages and a negative user experience.

How We Applied Client-Side Caching to Improve Feature Store Performance by 70%

At DoorDash, we make millions of predictions every second to power machine learning applications to enhance our search, recommendation, logistics, and fraud areas,  and scaling these complex systems along with our feature store is continually a challenge.

Using Fault Injection Testing to Improve DoorDash Reliability

Three key steps are of paramount importance to prevent outages in microservice applications, especially those that depend on cloud services: Identify the potential causes for system failure, prepare for them, and test countermeasures before failure occurs.