As machine learning (ML) becomes increasingly important across tech companies, feature engineering becomes a bigger focus for improving the predictive power of models.
Category Archives: Data
How to Run Apache Airflow on Kubernetes at Scale
As an orchestration engine, Apache Airflow let us quickly build pipelines in our data infrastructure.
The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000%
In our real-time delivery logistics system, the environment, behavior of Dashers (our term for delivery drivers), and consumer demand are highly volatile.
Building Faster Indexing with Apache Kafka and Elasticsearch
DoorDash describes how it built a faster search index using open source projects.
Overcoming Rapid Growth Challenges for Datasets in Snowflake
A proper optimization framework for data infrastructure streamlines engineering efforts, allowing platforms to scale.
Maintaining Machine Learning Model Accuracy Through Monitoring
Machine learning model drift occurs as data changes, but a robust monitoring system helps maintain integrity.
Building Riviera: A Declarative Real-Time Feature Engineering Framework
In a business with fluid dynamics between customers, drivers, and merchants, real-time data helps make crucial decisions which grow our business and delights our customers.
How to Drive Effective Data Science Communication with Cross-Functional Teams
Analytics teams focused on detecting meaningful business insights may overlook the need to effectively communicate those insights to their cross-functional partners who can use those recommendations to improve the business.
Running Experiments with Google Adwords for Campaign Optimization
Running experiments on marketing channels involves many challenges, yet at DoorDash, we found a number of ways to optimize our marketing with rigorous testing on our digital ad platforms.
Building Flexible Ensemble ML Models with a Computational Graph
DoorDash extended its machine learning platform to support ensemble models.