Part 1: How DoorDash leveraged its product knowledge graph to enable a high-velocity tagging and badging experience

In 2023, DoorDash launched a number of item badges — user interface (UI) components that highlight key product attributes, such as the number of items in stock, as shown in Figure 1. Some badges performed well, while some did not. One thing was clear, though — consumers noticed the badges and changed their behaviors based on their perception of the badge’s value proposition. In this blog post, we explore the issues we encountered trying to ship new badges and the resulting architectural changes that we made.

*Figure 1: Item badge example showing that eight of the items are in stock*

Throughout 2024, engineering teams faced significant hurdles while launching badges:

Lengthy implementation times caused by fragmented systems that required changes across many microservices
Lack of standardized frameworks for testing, observability, and experimentation
Disconnected knowledge sharing across teams
High latency for badge changes to be reflected in the UI
Unclear prioritization of badges from a customer-facing perspective

To address these issues in 2024, we put our badge creation through a transformative process of platformization, decoupling badge ingestion, or data tagging, from badge serving, or UI rendering, and integrating these processes into DoorDash's product knowledge graph, or PKG.

Why product knowledge graph?

PKG — which is still under development — is a system driven by machine learning that collects, houses, and understands menu and catalog data across restaurants and retail businesses. It will eventually become DoorDash’s competitive advantage and a differentiator for its merchants and consumers. The PKG vision surfaced in alignment with the badging platformization needs, allowing badging attributes to rapidly become the first feature to be developed end-to-end in PKG.

It is useful to define further what exactly we mean by a badge. A badge is a UI component that customers can see — for example, "Many in stock," "HSA/FSA Eligible," and item ratings as can be seen in the example in Figure 2. Badges are derived from data tags, symbols, labels, or indicators that signal a distinctive property of the associated object — in this case, an item in a store — and can be used to filter search results.

*Figure 2: Among the badges shown in this example are “Many in stock,” “HSA/FSA Eligible,” and the item’s 4.4-star rating.*

Tags can be ingested into PKG via a processing pipeline. Badges are rendered on-demand separately. Naturally, tags and badges should be handled by separate systems: PKG will handle the ingesting and serving of tags, while a badge-serving framework will handle the rendering of badges from tags.

The rest of this blog post will focus on tags, while a follow-up blog post will focus on badges.

PKG processing

PKG processing is responsible for two operations: tag onboarding and tag connection. Onboarding creates a new tag — for example, “high stock” — in the PKG system. Tag connection applies the new tag to one or more items.

Tag onboarding

There are different types of tags, such as “high stock” and “x-in-stock.” We call the tag onboarding endpoint when we need to create a new tag in the PKG system. The protobuf to do so, which uses the following definitions, is shown below:

Tag_id: The unique identifier for the tag
Tag_type: The type of the tag, such as "stock_level" or "dietary"
Tag_scope: The scope of the tag where it is applied, such as "store" or "business"
Tag_owner: The owner of the tag ID, such as INF-P or DASHMART. The owning entity would be allowed to make tagging requests.

rpc UpsertTagMetadata(UpsertTagMetadataRequest) returns (UpsertTagMetadataResponse);

// Request body for the UpsertTagMetadata endpoint.

message UpsertTagMetadataRequest {

  // Tag metadata that needs to be created or updated for onboarding.

  TagMetadata tag_metadata = 1;

}

message TagMetadata {

  // The unique identifier for the tag

  string tag_id = 1;

  // The type of the tag. eg: "stock_level", "dietary" etc.

  string tag_type = 2;

  // The scope of the tag where it is applied. eg: "store", "business" etc.

  string tag_scope = 3;

  // The owner of the tag id. eg: INF-P, DASHMART. The owning entity would

  // be allowed to make tagging requests

  string tag_owner = 4;

}

Once this endpoint is called, the PKG processing pipeline simply converts the tag request into a tag node (see "ProductTagNode" in the code box below) and saves it to PKG storage.

Tag connection

When we need to tag or untag an item, we can call the ManageTagLinks endpoint; the protobuf for this is shown below:

// Manage tag links between tags and items.

rpc ManageTagLinks(ManageTagLinksRequest) returns (ManageTagLinksResponse);

// The request body for the ManageTagLinks endpoint.

message ManageTagLinksRequest {

  // The operations this request to perform on the tag links.

  repeated TagLinkOperation tag_link_operations = 1;

}

// Operation to manage the tag links for an entity.

message TagLinkOperation {

  // one of the business_id or store_id needs to be

  // provided for identifying the level under which

  // the item needs to be tagged

  oneof identifier {

    // business_id and item_id which needs to be tagged

    BusinessItemIdentifier business_item_id = 1;

    // store_id and item_id which needs to be tagged

    StoreItemIdentifier store_item_id = 2;

  }

  // the tags which need to be attached to or removed from the item.

  // It internally contains the tag_id and a list

  // of parameters.

  repeated Tag tags = 3;

}

After we receive a ManageTagLinks request, the PKG processing pipeline validates the item and tags to see if they are valid, then connects them by creating edges (see "StoreItemHasProductTagEdge" in the code box below) between the item and tag nodes. Tag connection processing happens offline via a Kafka and Flink pipeline.

PKG storage

PKG storage is a logical graph abstraction service built on top of CoachroachDB, or CRDB. It provides the following benefits:

Intuitive data modeling: PKG storage provides a graph-based abstraction layer, enabling developers to form mini-domains and easily manage relationships between data entities. Transitioning a relationship from 1-to-1 to 1-to-n, for example, becomes straightforward, significantly improving adaptability as domain models evolve.
Seamless integration: Applications can integrate with PKG storage effortlessly without needing to understand its internal implementations. Input/output interfaces are decoupled from the underlying storage mechanisms, ensuring application functionality remains unaffected even if PKG storage’s implementation changes.
Feature-rich and configurable: PKG storage offers a comprehensive feature set that caters to a wide range of application needs, including access control, transaction management, and versioning. These features can be enabled or disabled easily through configuration, allowing teams to customize PKG storage to fit their specific requirements.

We decided to model a graph on top of CRDB instead of adopting a new graph database primarily because there was no organizational appetite to onboard, operate, or maintain a new graph database. Additionally, our approach is to build a lightweight wrapper layer on CRDB to enable property graph operations, making application logic more intuitive while avoiding the complexity of managing a new system. This strategy balances engineering effort, offering flexibility to adopt a market graph database later and fostering a more robust and less error-prone graph-native development style.

The storage side modeling for the tag use case is shown below:

@NodeModel(PRODUCT_TAG_NODE_TYPE)

data class ProductTagNode(

    val tagId: String,

    val tagType: String,

    @UniqueId

    val id: String = "$tagType|$tagId"

    val description: String? = null,

    val createdAt: Long,

    val updatedAt: Long,

    // for the static properties

    val tagParams: Map<String, String>,

)

@EdgeModel(HAS_PRODUCT_TAG_EDGE_TYPE)

data class StoreItemHasProductTagEdge(

    val tagId: String,

    val tagType: String,

    @SourceUniqueId(STORE_ITEM_NODE_TYPE)

    val storeItemId: String,

    @TargetUniqueId(PRODUCT_TAG_NODE_TYPE)

    val id: String = "$tagType|$tagId"

    val tagParams: Map<String, String>,

)

The graph shown in Figure 3 shows how the item, tag nodes, and tag connection edges are represented in PKG storage.

*Figure 3: A StoreItem node can be modeled as connected to multiple TagNodes that become data input for badges*

Stay Informed with Weekly Updates

Subscribe to our Engineering blog to get regular updates on all the coolest projects our team is working on

PKG serving

Data written into PKG can be read out directly from the PKG storage layer. For downstream clients, however, reading from a separate serving layer rather than doing a direct read out of the PKG storage layer has several advantages:

Read optimization: The data can be structured and indexed specifically for high-performance reads, avoiding unnecessary computations during queries
Simplified client interface: Many downstream services do not need to understand graph storage concepts. Having a serving layer abstracts these complexities away, offering simpler APIs
Separation of read/write operations: Such separation reduces contention between transactional operations and data serving

For these reasons, after we write tagging data asynchronously offline to the graph storage, we also replicate the data to a separate database to serve reads. When it is time to query, we do so via online Google remote procedure call, or GRPC, API’s. This part of the architecture, as shown in Figure 4, is called PKG serving. Key components of the serving layer that were developed for tagging include:

Indexer: Responds to change signals from the public graph to convert upstream graph data to serving schema and writes the processed data into serving storage
Product entity storage service, or PESS: Avoids coupling business logic with data access logic through a data access storage layer
Online serving: Provides low-latency API’s for downstream clients to access serving data

*Figure 4: Serving layer sits at the edge of PKG between storage layer and clients, such as the downstream badge serving framework*

Let’s now take a closer look to understand how public graph data is ingested into the serving database.

After changes have been written to the public graph, a change data capture, or CDC, event is published from the storage layer. An indexing dispatcher owned by the serving layer then consumes each CDC event to convert it to an indexing task that it then puts into relevant task queues for processing. Specific instantiations of indexing workers then pull off particular queues as assigned to process the tasks and write the data from each to the serving database.

The indexing task is a generic data structure that contains graph data. It does not know anything about specific product domains or data sources, instead focusing exclusively on graph representations of the data, thus making it extensible to other types of data in the future. The following is the current schema of an indexing task:

// Issued by the indexing dispatcher; received by indexing workers

message IndexingTask {

  // indices for traversal start nodes

  repeated graph_proxy.v1.NodeIndex traversal_start_nodes = 1;

  // entities updated

  repeated EntityUpdate entity_updates = 2;

  // type of indexing task

  IndexingTaskType indexing_task_type = 3;

  // distributed context

  com.doordash.pkg_common.v1.DistributedContext distributed_context = 4;

  // The source of this indexing task, i.e. dispatcher, indexer kafka sinks, etc.

  google.protobuf.StringValue indexing_task_source = 5;

  // source CDC publish timestamp - in most cases, this will refer to the publish

  // time of graph CDC

  google.protobuf.Timestamp source_cdc_published_at = 6;

}

Based on the task contents, an indexing worker queries and traverses the public graph to get the latest data. After that, it uses PESS to interact with the database to write that graph data to serving storage.

For any given CDC event, how can we create an extensible way to map it to the relevant indexing tasks? Note that one CDC event may produce 0..N indexing tasks. We introduced a dispatchable interface to make it easier for product engineers to add their own indexing tasks. By implementing the dispatchable interface and having the dispatcher discover implementations via reflection at runtime, product engineers can easily add their indexing use cases to the dispatcher.

abstract class Dispatchable {

    /** Returns true if this Dispatchable should consume the given [GraphProxyCdcEvent] to produce indexing tasks. Otherwise false. */

    abstract fun consumesEvent(event: GraphProxyCdcEvent): Boolean

    /** Returns a list of [IndexingTask]s based on the given [GraphProxyCdcEvent]. */

    abstract suspend fun convertToTasks(event: GraphProxyCdcEvent): List<IndexingTask>

}

For the tag use case, we implemented an InventoryDispatchable that extends the dispatchable class above:

class InventoryDispatchable @Inject constructor(

    private val graphModelClient: GraphModelClient,

) : Dispatchable() {

    override fun consumesEvent(event: GraphProxyCdcEvent): Boolean {

        return (event.edgesDeletedList + event.edgesUpdatedList).any {

            it.type == HAS_PRODUCT_TAG_EDGE_TYPE

        }

    }

    override suspend fun convertToTasks(event: GraphProxyCdcEvent): List<IndexingTask> =

        (event.edgesUpdatedList + event.edgesDeletedList).filter {

            it.type == HAS_PRODUCT_TAG_EDGE_TYPE

        }.map { edge ->

            IndexingTask.newBuilder()

                .addTraversalStartNodes(edge.sourceNodeIndex)

                .setIndexingTaskType(IndexingTaskType.INVENTORY_INDEXING_TASK)

                .setDistributedContext(event.distributedContext)

                .build()

        }

}

An inventory indexing worker then picks up the task generated above. It traverses the graph to write the inventory data to the serving database according to the following schema:

Column	Data type	Nullable	Description
store_id	String	No	Primary key
merchant_catalog_id	String	No	Primary key
menu_item_id	String	No
global_catalog_id	String	Yes
content_json	JSONB	No	RetailInventory proto in JSON
content_bytes	BytesString	No	RetailInventory proto in bytes
created_at	Timestamp	No
updated_at	Timestamp	No

Store_id, merchant_catalog_id, menu_item_id, and global_catalog_id support efficient querying via read endpoints. The data is stored in a serialized format– content_json and content_bytes. The schema is written as individual rows in CRDB and stored as key-value pairs to make it performant.

Tying it together: How to send tags to PKG without code changes

To badge store items, a client service first issues a bulk request call to PKG processing’s tag connection endpoint, asking it to update the items’ tags. An asynchronous process kicks off that writes tag data to the PKG storage layer and indexes it in the serving layer. To display those store items to consumers, an online serving endpoint queries the serving database to retrieve the relevant tags. The badge-serving framework then displays the relevant badges.

A follow-up post will discuss in more detail how the badge-serving framework turns tags into badges, through a fast, code-free experience that also enables other capabilities, including standardized observability, testing, and experimentation frameworks.

Part 1: How DoorDash leveraged its product knowledge graph to enable a high-velocity tagging and badging experience

Why product knowledge graph?

PKG processing

Tag onboarding

Tag connection

PKG storage

Stay Informed with Weekly Updates

Please enter a valid email address.

Thank you for Subscribing!

PKG serving

Tying it together: How to send tags to PKG without code changes

About the Authors

Related Jobs

Recent Blogs