DoorDash’s principles and processes for democratizing Machine Learning
Six months ago I joined DoorDash as their first Head of Data Science and Machine Learning. One of my first tasks was to help decide how we should organize machine learning (ML) teams in order for us to reap the maximum benefit from this wonderful technology. You can learn more about some of the current use cases of ML at DoorDash at our blog here. Having spent some time at previous technology companies and spoken to many more, I was acutely aware of many of the challenges that come up.Challenges
- ML is poorly defined: Is a linear regression in Excel ML? What about a toy random forest in a local Jupyter notebook? Where is the line between analytics and ML?
- ML needs Engineering and Science: ML at technology companies requires performant optimal decision-making.
- ML advances rapidly: Even over just the last five years we have seen modeling approaches and platforms and languages change almost every 18 months.
- ML is trendy: many people view ML as magic and so everyone wants to work on it.
Vision
Build data-driven software for advanced measurement and optimization
Principles
- Democracy: everyone can build and run an ML model given sufficient tooling and guidance.
- Talent: we want to attract and grow the best business-impact focused ML practitioners.
- Speed: if a cost-effective third party ML solution already exists then we should use it.
- Sufficiency: if a function (typically Engineering) can implement a good-enough ML solution unaided then they should do so.
- Incrementality: if a function (typically Data Science) can add enough incremental value to an ML solution then they should do so.
- Accountability: each ML solution has a single technical lead acting as the technical decision-maker.
Organization
- Reporting lines: ML Engineers report to Engineering managers and ML Data Scientists report to DS managers. ML Infrastructure reports into the central Data Platform team.
- Hiring: Job descriptions and hiring processes for ML Engineers and ML Data Scientists are reviewed and approved by ML Council.
- Technology: Strong investment in a centralized ML platform by Data Platform (workflow, provisioning, orchestration, feature stores, common data preparation, validation, quality checks, monitoring, etc.). Potential ML infrastructure technology (build/buy) decisions reviewed and approved by ML Council.
- Execution:
- Any person(s) at the company can identify a use case for ML and draft a proposal (business problem, estimated impact versus build / maintenance cost, solution, team composition, single technical lead).
- The proposal is reviewed, amended, and approved by the pod’s / vertical’s cross-functional leads (PM, EM, DS Manager, Analytics Manager, etc.). The leads should approve the business problem, prioritization, and impact / cost.
- The proposal is reviewed, amended, and approved by the ML Council.
- All steps of the review will be transparent: ML Council and ML practitioners will meet weekly at ‘ML Review’ to review items and debate next steps. Decisions will be made at this ML Review and notes will be taken and emailed to all interested folks.
ML Council
- Composition: the ML Council is composed of a group of experienced ML practitioners across the company, typically senior Engineering ML, Data Science ML, and Infrastructure ML folks. It is led by the ML Council Chair, who serves as the decision-maker for escalations. Rotates on some cadence e.g. every 12 months
- Role: the role of the ML Council is to:
- provide balance between project-specific variability vs company wide uniformity, so that we are efficient as a company
- review and give feedback on all of new ML applications
- facilitate the cross-pollination of ideas and solutions
- create better visibility into common pieces (to feed into infra)
- encourage more proactive communication of data sources and solutions.
- Responsibility: Typically the ML Council should ensure that if production performance is the biggest blocker to success then the tech lead is an ML Engineer. Otherwise if statistical performance is the biggest blocker to success then the tech lead is a Data Scientist. The ML Council should check solutions have enough support and where possible are part of the long term ML platform investment.
- Autonomy: If the ML Council disagrees on the solution / team / lead, then the ML Council Chair tie-breaks and makes a decision.