Manually testing the Geo service
When we introduce new location-based logic, such as changing the radius of our geofence around merchants, to DoorDash’s Geo service, our team runs end-to-end tests for a delivery to ensure the quality of our deployment. Previously, the only way to run these tests involved manually creating a test delivery through the DoorDash app, and using DoorDash’s internal tools and the Dasher app to simulate the delivery flow, as shown in Figure 1, below: Using the DoorDash app to simulate a delivery is not the best approach, for multiple reasons:- Manual testing makes innovation harder and slower. Creating, assigning, and completing deliveries is a painfully slow process. For just one test, we had to undergo the whole process shown in Figure 1. This limits the number of iterations we can make on our service, since testing is a bottleneck for deployment.
- Manual testing is unreliable. Manually mocking real-time location updates for a test delivery is nearly impossible, unless we could physically walk or drive from the store to the consumer. Also, we cannot easily test different store or consumer addresses, since changing those would also require a lengthy manual process. This makes testing for nuanced or localized situations difficult.
Choosing the best simulator design archetype
Before implementing the simulator, we had to carefully choose a structure and design archetype that would most effectively fit our needs. The literature on simulation is vast, but we considered three main types: Discrete-event simulation: Discrete-event simulation models a distinct series of events over time. The state of the system changes only when an event occurs. The benefits of discrete-event simulation are that it is simple and fast to implement, and real data points, such as location updates, are always collected in the discrete form. The drawback of this model is that it approximates continuous systems, such as traffic and transportation, yielding a less realistic and less accurate model. Continuous simulation: In continuous simulation, the state of the system is constantly changing. This type of simulation can more accurately replicate continuous phenomena, but is more complex to implement. In particular, it requires the use of differential equations to build. Agent-based modeling: This type of model is used to simulate behavior and interactions between individuals, or agents, within a system. It provides the most complexity, since each agent can have variations in their behavior or traits. For example, it can replicate a driver who tends to drive fast or slow, or one who often veers off route. Due to its complexity, it is more time-consuming to implement. It also has the most relevance when there are many entities being observed.Table of simulator design archetypes
Simulator type | How it works | Pros | Cons |
Discrete-event simulation | Captures distinct events over time |
|
|
Continuous simulation | Presents a scenario that is constantly changing |
|
|
Agent-based modeling | Simulates behaviors between different individuals |
|
|
- First, it is quick to implement. Although the simplest implementation, it is extensible: we can later increase the complexity with better location, routing, or traffic models, or we could even integrate agent-based modeling on top of discrete-event simulation if we wanted to observe the behavior of a fleet of Dashers.
- Second, discrete-event simulation yields itself nicely to our service architecture. Our Geo service collects batches of location data from Dashers every few seconds through Apache Kafka. Fixed-increment time progression accurately replicates this interaction, since it allows us to update the state, and thus send location updates, at the end of a set interval.
- Finally, since our service only observes data in discrete intervals and is agnostic to what happens in between those intervals, a lot of the complexity and power granted by the other simulation types would be lost.
Architecting our implementation of the simulator
Now that we had figured out what model we wanted to use, the next step was to plan where the simulator fit into our architecture. We would need to integrate it with existing parts of the DoorDash platform while devising its own routing and location update models for it to deliver accurate results.The high-level overview of the simulation architecture
As shown in Figure 2, above, the simulator takes the test Dasher, merchant, and consumer information, the original location of the Dasher, and the desired delivery status (one of picking up, waiting at store, or dropping off) as its input. The simulator uses this input to create a test delivery and assigns it to the Dasher through the endpoints in the DoorDash platform. It stores this information in the simulator state, then uses the Google Maps Directions API to build its routing and location update models. After this initialization, the simulator is ready to run. During each cycle, the position of the Dasher is updated using the routing and location update models. This new location is used to update the simulator state, transitioning the delivery state if necessary. Using the new simulation state, the simulator publishes a location update to our Kafka cluster, reports any proximity or location-based logic that was triggered due to the update, and discloses any change to the delivery state that occurred in the previous interval. The simulation runs until either the user pauses the simulation or the delivery completes.Designing the routing model for the Dasher’s movement on a delivery
The routing model is currently comprised of four parts:- The route from the Dasher’s origin to the merchant location
- The Dasher’s movement around the merchant
- The route from the merchant to the consumer
- The Dasher’s movement around the consumer