Scénarios et exigences
Lorsque nous avons étudié l'utilisation de la ML chez DoorDash, les scénarios clés suivants sont apparus :- Online models - This is the scenario where we make predictions live in production in the critical path of the user experience. In this scenario the models and frameworks need to be performant and have a low memory footprint. We also need to understand both the modeling frameworks and services frameworks, most in-depth here. Consequently, this is where the restrictions about which ML frameworks to support and how complex models will be stringent. Examples of these at DoorDash include food preparation time predictions, quoted delivery time predictions, search ranking, etc.
- Offline models - These predictions are used in production, but predictions are not done in the request/response paths. In this scenario runtime performance is secondary. Since these predictions are still used in production, we need the calculations to be persisted in the warehouse. Examples of these at DoorDash are demand predictions, supply predictions, etc.
- Exploratory models - This is where people explore hypotheses, but the model or its output are not used in production. Use cases include exploring potential production models, analysis for some identifying business opportunities, etc. We are explicitly not placing any restrictions on frameworks here.
- Normaliser les cadres de ML: Étant donné le nombre de frameworks ML disponibles, par exemple LightGBM, XGBoost, PyTorch, Tensorflow, il est difficile de développer une expertise au sein d'une entreprise pour un grand nombre d'entre eux. Il est donc nécessaire de standardiser un ensemble minimal de frameworks qui couvre l'ensemble des cas d'utilisation typiquement rencontrés chez DoorDash.
- Cycle de vie du modèle: Prise en charge du cycle de vie du modèle de bout en bout, qui consiste à émettre des hypothèses d'amélioration, à former le modèle, à préserver les scripts de formation, à procéder à une évaluation hors ligne, à des tests fantômes en ligne (faire des prédictions en ligne dans le seul but de les évaluer), à des tests A/B et, enfin, à expédier le modèle.
- Caractéristiques: Nous utilisons deux types de caractéristiques. Le premier type est celui des caractéristiques au niveau de la demande, qui capturent des informations spécifiques à la demande, par exemple le nombre d'articles dans une commande, l'heure de la demande, etc. Le second type est celui des caractéristiques environnementales qui capturent l'environnement dans lequel DoorDash opère. Par exemple, le temps d'attente moyen dans un magasin, le nombre de commandes dans les 30 dernières minutes dans un magasin, le nombre de commandes d'un client dans les 3 derniers mois, etc. Les caractéristiques environnementales sont communes à toutes les demandes. Nous avons besoin d'un bon moyen de calculer et de stocker les caractéristiques environnementales.
Standardisation des cadres de ML pris en charge
La première étape de la mise en place d'une plateforme de ML a consisté à normaliser les cadres de ML qui seront pris en charge. La prise en charge d'un framework nécessite une connaissance approfondie de celui-ci, tant au niveau de l'API qu'il fournit que de sa qualité et de l'optimisation de ses performances. En tant qu'organisation, il est préférable de connaître quelques frameworks en profondeur plutôt que d'en connaître plusieurs de manière superficielle. Cela nous permet d'offrir de meilleurs services à ML et de tirer parti du savoir-faire de l'organisation. L'objectif était de trouver le point idéal pour faire les bons compromis lors de la sélection des cadres. Par exemple, s'il existe un modèle pré-entraîné dans un cadre donné qui n'est pas disponible dans les cadres actuellement pris en charge et que la construction d'un tel modèle va demander des efforts considérables, il est logique de prendre en charge un cadre différent. After completing an internal survey on currently used model types and how they might evolve over time, we arrived at the conclusion that we need to support one tree based model framework and one neural network based modeling framework. Also given the standardization of DoorDash's tech stack to Kotlin, we needed something that had a simple C/C++ API at the prediction time to hook up into the Kotlin-based prediction service using JNI. For tree based models we evaluated XGBoost, LightGBMet CatBoost, measuring the quality of the model (using PR AUC) and training/prediction times on production models we already have. The accuracy of models were almost the same for use cases we had. For training, we found that LightGBM was fastest. For predictions, XGBoost was slightly faster than LightGBM but not by a huge margin. Given the fact that the set of current models were already in LightGBM, we ended up selecting LightGBM as the framework for tree based models. For neural network models, we looked at TensorFlow et PyTorch. Here again, for our use cases we did not find a significant difference in quality of the models produced between these two. PyTorch was slower to train on CPU's compared to Tensorflow, however on GPUs both had similar training speeds. For predictions, both of these had similar predictions per minute numbers. We then looked at the API set for Tensorflow and PyTorch for both training and prediction time and concluded that PyTorch gave a more coherent API set. With the launch of TorchScript C++ support in PyTorch, we had the right API set needed to build the prediction service using PyTorch.Piliers de la plate-forme ML :
Après la décision sur le cadre de ML, les quatre piliers suivants ont émergé sur la base des scénarios de prédiction et des exigences :- Modeling library - A python library for training/evaluating models, creating model artifacts which can be loaded by the Prediction Service, and making offline predictions.
- Model Training Pipeline - A build pipeline where models will be trained for production usage. Once a model training script is submitted into git repo, this pipeline takes care of training the model and uploading the artifacts to the Model Store. The analogy here is if the modeling library is the compiler that produces the model, then the model training pipeline is the build system.
- Features Service - To capture the environment state needed for making the predictions, we need feature computation, feature storage and feature serving. Feature computations are either historical or in real time.
- Prediction Service - This service is responsible for loading models from the model store, evaluating the model upon getting a request, fetching features from the Feature Store, generating the prediction logs, supporting shadowing and A/B testing.
Architecture de la plateforme ML de DoorDash
Sur la base de ce qui précède, l'architecture du flux de prédictions en ligne (avec une brève description des composants) se présente comme suit : Feature Store - Low latency store from which Prediction Service reads common features needed for evaluating the model. Supports numerical, categorical, and embedding features. Realtime Feature Aggregator - Listens to a stream of events and aggregates them into features in realtime and stores them in the Feature Store. These are for features such as historic store wait time in the past 30 mins, recent driving speeds, etc. Historical Aggregator - This runs offline to compute features which are longer-term aggregations like 1W, 3M, etc. These calculations run offline. Results are stored in the Feature Warehouse and also uploaded to the Feature Store. Prediction Logs - This stores the predictions made from the prediction service including the features used when the prediction was made and the id of the model used to make the prediction. This is useful for debugging as well as for training data for the next model refresh. Model Training Pipeline - All the production models will be built with this pipeline. The training script must be in the repository. Only this training pipeline will have access to write models into the Model Store to generate a trace of changes going into the Model Store for security and audit. The training pipeline will eventually support auto-retraining of models periodically and auto-deploy/monitoring. This is equivalent to the CI/CD system for ML Models. Model Store - Stores the model files and metadata. Metadata identifies which model is currently active for certain predictions, defines which models are getting shadow traffic. Prediction Service - Serves predictions in production for various use cases. Given a request with request features, context (store id, consumer id, etc) and prediction name (optionally including override model id to support A/B testing), generates the prediction. Nous commençons à peine à exécuter ce plan, il y a encore beaucoup de travail à faire pour le construire, le mettre à l'échelle et l'exploiter. Si vous êtes passionné par la construction de la plateforme ML qui alimente DoorDash, n'hésitez pas à nous contacter.Remerciements : Cody Zeng, Cem Boyaci, Yixin Tang, Raghav Ramesh, Rohan Chopra, Eric Gu, Alok Gupta, Sudhir Tonse, Ying Chi, and Gary Ren