How we deployed a scikit-learn model with Flask and Docker

In our last post we discussed our customer satisfaction prediction model. We used AzureML studio for our first deployment of this machine learning model, in order to serve real-time predictions. In this post we would like to share how and why we moved from AzureML to a Python deployment using Flask, Docker and Azure App Service. During this time we also tried Azure Function with Python. In addition, we open-sourced a sample Python API with Flask and Docker for machine learning.


After generating our first customer satisfaction prediction model, we wanted to deploy it quickly and easily. AzureML served this purpose, but also showed some difficulties. The model we generated was developed using scikit-learn, but the model we deployed to production was generated using AzureML. The parameters therefore, were not exactly the same and some metrics we measured were also different. In addition, we had some data preparation logic that was shared by both the model training and the real-time predictions, but was duplicated code-wise. Lastly, we found AzureML GUI very slow and therefore not very user friendly.


From the above issues came the following requirements for our second prediction model deployment architecture:

  • Fast and easy deployment
  • Same model trained in local environment can be deployed to production
  • Share data preparation code between production and model training
  • Easy end-to-end control on the flow
  • Not a must: easy integration to continuous deployment (e.g., codefresh)


First we tried to use Azure Function rest API with Python. The setup was amazingly easy and we were able to run Python code in no time. In addition, working with Kudu was also easy, fast and very user friendly. Although it showed promise, things got more difficult later:

  • Installing libraries for machine learning is harder on Windows machines ― some basic libraries like NumPy required special treatment then just “pip install”
  • Azure functions run all imports on each request, and in our case (numpy, scikit-learn, pandas) imports took around 30 seconds (see this issue for more details) ― this was a deal breaker for us


Finally, we decided to try deploying our own server using:

  • Flask: micro web framework for Python, very easy to setup
  • Docker
  • Azure App Service


What we gained from this:

  • Since we deployed our own server, we had full end-to-end control ― this allowed easy code sharing between real-time API and offline model training
  • The actual model that was trained offline was deployed to production using Pickle
  • Since Docker runs on Linux VM, no more Windows “pip install” difficulties
  • Docker enables this service to be hosted on multiple platforms ― if Azure App Service will show problems migration will be very easy
  • Easy integration from the source code git repository to codefresh ― this handles all our continuous deployment process ― every commit to our git repository will deploy the new code to a secondary service that can be easily swapped to being our primary real-time service


As a side note, Azure App Service support for Docker is still in preview ― more information on this can be found here.


Want to give it a try?

A sample docker image can be found here

A code sample of this architecture can be found here


In the next post we will share more on the real-time performance of our new system architecture, new issues that will arise (there are always issues 🙂 and more on our new challenges and features.



Your CI can be a whole lot better


How I learned to stop worrying and love TeamCity mac agents


  1. NicolasH

    Thank you for this article & project. It works well.

    This post helped me finalize the azure deployment :

  2. Maybe you might want to have a look at . The server can hold multiple models and models can later be uploaded on runtime.
    The requests are sent through POST as JSON and repsonses are also JSON.
    You could have datascientist write their own models and then upload it to the server at a specified url. Models should implement transform for new data and predict for new data. Otherwise the models are also pickled but the server is written from BaseHTTPServer which comes with python.

    Kind Regards
    Orges Leka

Leave a Reply

Your email address will not be published. Required fields are marked *

Powered by WordPress & Theme by Anders Norén