When a company with millions of consumers such as DoorDash builds machine learning (ML) models, the amount of feature data can grow to billions of records with millions actively retrieved during model inference under low latency constraints.
Author Archives: Arbaz Khan
Enabling Efficient Machine Learning Model Serving by Minimizing Network Overheads with gRPC
The challenge of building machine learning (ML)-powered applications is running inferences on large volumes of data and returning a prediction over the network within milliseconds, which can’t be done without minimizing network overheads.