For any e-commerce business, pricing is one of the key components of the customer shopping experience. Calculating prices correctly depends on a variety of inputs, such as the shopping cart contents, pertinent tax rules, applicable fees, and eligible discounts. Not only is pricing business logic complex, prices are prevalent throughout the purchase funnel and the underlying mechanisms that compute prices must be resilient to failures.
As DoorDash scaled, the number of customer-facing scenarios that required pricing calculations increased. The presentation of prices in all of these scenarios also needed to be reliable, consistent, auditable, and scalable.
The challenge we faced was that our pricing logic was implemented in a legacy monolithic codebase with interconnected modules that prevented platform-wide consistency when calculating prices. Given how tightly coupled different components were and how large the codebase was, it was easy to introduce unintentional side-effects when making changes. Complex build and verification processes also resulted in slow release cycles, hampering developer productivity.
To address these issues and build a pricing service that could meet our requirements, we extracted the business logic out of the monolith. The new pricing service was built with Kotlin, gRPC, Redis, and Cassandra, and was migrated with neither downtime nor data inconsistencies.
Why our legacy monolith didn’t work well
DoorDash was built on a monolithic codebase that experienced growing pains as our business and teams scaled.
Here are some of the issues we faced:
- When a consumer checks out, their total price can consist of more than ten different line items. In the legacy codebase, each line item had duplicated implementations dispersed throughout the codebase. It was also unclear when and how a specific implementation of a given line item should be used.
- As it often is with large, monolithic systems, the technical debt in the legacy codebase had been piling up for many years, and as more engineers joined the team and implemented new features, the code became increasingly fragile and difficult to read.
- We knew that, as DoorDash continued to grow, the legacy system would not be able to keep up with the increase in traffic.
Additionally, as DoorDash continued to expand its business into new verticals such as groceries and convenience, we needed an extensible framework that would:
- Be highly reliable and available
- Increase software development velocity
- Meet our auditability and observability requirements
- Ensure the integrity of the prices that we present to consumers
Building a pricing service as a platform
In order to address the problems we observed, we decided to extract the pricing logic as its own microservice that would become the centralized place for all customer-facing price calculations at DoorDash.
The framework in this new pricing service provided a central place for defining DoorDash's pricing components and how those components are derived. The service also provided an orchestration engine that evaluated those definitions and calculated component values in the most efficient way possible.
How a price quote gets calculated
Each request to the pricing service kicks off a common pipeline that is described below.
Stage 1: Initialize the context
The context contains metadata for the request, such as information related to the user, store, and cart. The information can be fetched from downstream services, populated from the request, or loaded from the database when the historical context is needed (for example, when updating an existing order).
This context also carries forward all intermediate price calculations added throughout the pipeline.
Stage 2: Fetch necessary information from downstream services
After initializing the context, the next step of the pricing process is to collect things like item prices, delivery fees, and available credits — the primitive pricing components and metadata that serve as the foundation for a requested pricing operation.
The mechanics of how to retrieve or calculate each primitive pricing component is defined by objects known as Providers. Inherent in being responsible for the primitives of the pricing calculation, Providers have no dependency on each other and are run in parallel.
After execution, each Provider adds the calculated primitive values to the pricing context as named entities known as "adjustments".
Stage 3: Aggregate data and make final calculations
With the adjustments added from running Providers, the pricing framework can start making complex calculations such as calculating the tax amount from aggregating item prices and fees and the discounts from the promotion metadata and item details.
These operations are run in sequence by objects called Collectors. When run, each Collector adds new adjustments from aggregating data from multiple sources. Collectors not only use primitive values but also utilize the adjustments returned from the previous Collector.
Stage 4: Construct the response
When all the necessary adjustments are added by going through Providers and Collectors, the Renderer is responsible for selecting specific adjustments and building the response in the desired format.
Stage 5: Validate the response and persist metadata into the database
Before returning the response to the client service, the pricing service runs through some checks to make sure the response is valid. When the validation checks pass, all the adjustments used throughout the session are persisted into the database to be retrieved when needed for subsequent requests for the same order.
Rolling out the new pricing service with no downtime
While rolling out the new pricing service, we needed both the legacy system and the new microservice to be both highly available and consistent with each other. We set up background comparison jobs in our backends for frontend (BFFs) that called the legacy system and the new service in parallel and compared the two responses. Using this comparison process, we verified that all price components were equivalent between the two systems before the actual rollout.
For the rollout itself, we started with a small group of pricing team engineers, and moved from there to all employees, before incrementally ramping up the exposure to public traffic.
Results from the new pricing service
After the rollout of the first few endpoints, we saw positive results in multiple areas.
Latency improvement
The pricing service is used for multiple use cases, and while we saw latency reductions in almost all use cases, the largest latency reduction we observed was for the endpoint that served the checkout function. During the checkout process, the order service calls the pricing service to calculate the final price quote for the order. Using our new service we observed that the p95 latency for this operation decreased by 60%. The latency is expected to decrease even more as we extract other dependencies from the monolithic codebase.
Pricing integrity
With every request, the pricing service runs multiple validation checks on the price quote to prevent potential regressions. From a simple validation ensuring that all price components add up to the grand total to more complicated scenario-based calculations, these extra sanity checks have helped maintain pricing accuracy while migrating to the new microservice.
Another big part of maintaining price integrity was ensuring that the prices a consumer saw on the checkout page matched what DoorDash actually charged the consumer. To ensure this, we introduced a price lock mechanism to ensure that the price stayed consistent between the two stages. The lock is built on top of Redis, which we utilized as an in-memory key-value database.
Auditability
At the end of each operation, the pricing service stores immediate price quote results in the database for monitoring, auditing, and debugging. We also persist all the metadata in the context so that we can rerun a specific request with the same metadata through the price engine whenever necessary.
Availability
One of the biggest challenges in the legacy system was that it was not resilient. Depending on the criticality of downstream dependencies, the pricing service has different timeout, retry, and failure handling configurations to ensure that it remains highly available in the face of downstream service failures.
Outside of persisting data for auditability, the new service is also stateless and horizontally scalable.
Observability
We added a lot of metrics to the new system before it went live. By setting up thorough monitoring and alerts, we were able to compare the performance of the legacy system and the new system on a number of important dimensions such as latency, number of requests, error rate, response parity, and unexpected behaviors. This monitoring helped the team roll out the new service with confidence and enabled us to catch discrepancies before releasing to a larger audience.
In addition to comparing with the legacy system, we also implemented price trend dashboards that helped us find anomalies and regressions. By monitoring the trends of each price component, we can confidently roll out changes to specific price components and quickly know whether it has an unexpected impact on other price components.
Development velocity
The new pricing service allows developers to write tests in a suite that is far less entangled than the legacy codebase. Additionally, the new microservice is set up on a framework that enables deployments to pre-production environments. Together, these benefits have enabled engineers to deploy more quickly, more frequently, and with more confidence.
Conclusion
The migration was successfully completed without any major regression or noticeable downtime, and it continues to be highly available.
Providing the pricing service as a platform means that DoorDash engineers now have a centralized and standard framework that allows them to clearly and rapidly implement and test their new pricing components and use cases without having to worry about the complexities and fragility of the legacy system.
Many e-commerce companies will face similar challenges with pricing, especially around issues of availability. For companies experiencing high growth and considering migrating from a monolithic codebase, this framework could provide a good template for how to develop a microservice that provides improved scalability, stability, and extensibility.