Automated model serving to mobile devices

5 min readSep 30, 2018

The most common approach to deploying machine learning models is to expose an API endpoint. This API endpoint would generally be called via a POST method containing the input data for the model as the body, and responding with the output of the model. However, an API endpoint is not always the most appropriate solution to your use case.

There are, for example, use cases that may require a machine learning model to be deployed on a mobile device, such as:

The need to use the model offline or in low connectivity areas.
The need to minimize the amount of data being transferred — perhaps the user lives in a place where data is not cheap nor easily accessible, or the model requires large amounts of data as input.
The need to not be limited by network speed, requiring more immediate results

Many tools exist that allow for data cleaning, experimentation and deployment of models all in one. These tools, however, seem to only provide API endpoints or packaged model files.

This article aims to give an overview of how versioned models can be automatically deployed to mobile devices.

The article is based on the work of an awesome cross-functional team ❤

Let’s start with a high level view of the model deployment process. The data pipeline in the image below probably looks familiar to anyone that has deployed a model to production.

We have:

Data cleaning: This is the step where incorrect data is filtered out from the input and data is set up to be consumable by the model
Feature selection: You might have such a step if the features that you are feeding int your model require some additional computation. You might also have merged the data cleansing and feature selection steps of the pipeline together.
Model training: This is where the data is consumed in order to train a model with some pre-selected parameters (either through automation or through thorough experimentation)
Model deployment: This is the first step that I will dive a bit more deeply into below.

Model deployment

Model deployment for mobile devices can consist of the following steps:

Generate a configuration file for model details
Save the model
Freeze the model
Serve the model

Each of these is described in more details below.

Generate configuration file

We need to keep track of how we feed our data to the model in order to train it. This will:

Make it easier to tell the mobile device what features to extract and use for a newly downloaded model.
Contain information to keep track of what model is deployed on which device
Contain information on what the numeric output of the model means — such as corresponding labels
Contain information on how to access the relevant parts of the model — such as the names of the input and output layers of a neural network

The data representation recommended for this is JSON, as it is flexible, standard and readable. An example of such a configuration can be seen below:

Save the model

Once a model is trained and you have saved it, we have a representation of this model in some sort of file. If you are using Tensorflow you would have the following files:

A meta file: Contains the graph information, in other words, it describes the architecture of the model
An index file — Metadata of each tensor that is part of the graph
A data file — Contains the values of the variables, such as the values of the weights.

A Tensorflow model can be re-loaded using these 3 files and training can continue from where it stopped.

The image below shows the resulting files to be saved for a trained model:

Freeze the model

The saved files are not really deployable to a mobile device. The graph of a model needs to be frozen in order to be deployed. This means that the graph definition and the values for the variables need to be merged into a single binary (.pb) file consumable by the mobile device.

For more details on freezing a model, check out the tensorflow documentation here: https://www.tensorflow.org/lite/tfmobile/prepare_models

The merged file results in a model representation that cannot be re-loaded to train further, but can be used through tensorflow-lite or CoreML.

Note: In order to use the model in CoreML, the PB file needs an extra conversion step. For a blog post on this, check out https://medium.com/@jianshi_94445/convert-a-tensorflow-model-to-coreml-model-using-tfcoreml-8ce157f1bc3b

Serve the model

Once the mobile application starts the following steps are taken to use the model:

Get the latest configuration for the OS — the service provides this in JSON format
If the latest configuration has a higher version than the local backup configuration
Then, download the latest model — the service provides this in a zip file
And replace the local model and configuration with the latest downloaded
If the latest configuration does not have a higher version than the local one OR there is no internet connection
Use the locally stored model

Once the model has been downloaded the application can use the configuration JSON to determine:

What features to extract
The name of the input layer
The name of the output layer
The meaning of the output

The image below shows an overview of the serving process end to end:

Additional considerations

The above implementation covers the ‘happy path’ and the case of no internet connection. However, there are other things that could go wrong which need to be considered in order to fall back to an old model. These can be done through validating whether:

Model files are corrupted through network transfer
Configurations don’t fit the model deployed (for example 3 outputs, but only 2 described)
Features described are not understood by the mobile device

Conclusion

Through JSON configuration and model versioning, models can dynamically be deployed to mobile devices to be run locally. That said, there is some additional validation which needs to be implemented to fail gracefully when there are problems in the deployed model.

Originally published at intothedepthsofdataengineering.wordpress.com on September 30, 2018.