Skip to content

MLOPs : Devops for Machine Learning


Created Jul 17, 2021 – Last Updated Jul 17, 2021

Machine Learning
Digital Garden

MLOps is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operation (Ops). Practicing MLOps means that you advocate for automation and monitoring at all steps of ML system construction, including integration, testing, releasing, deployment, and infrastructure management.

Data scientists can implement and train an ML model with predictive performance on an offline holdout dataset, given relevant training data for their use case. However, the real challenge isn’t building an ML model, the challenge is building an integrated ML system and continuously operate it in production.  Some of the pitfalls are summarized in Machine Learning: The high-interest credit card of technical debt by google.

MLOps should be viewed as a practice for consistently managing the ML aspects of products in a way that is unified with all of the other technical and non-technical elements necessary to successfully commercialize those products with maximum potential for viability in the marketplace.

#Drivers unique to Machine Learning solutions that represent MLOps requirements

  • General DevOps drivers applied to MLOps!
  • Optimizing the process of taking ML features into production by reducing Lead Time
  • Optimizing the feedback loop between production and development for ML assets
  • Supporting the problem-solving cycle of experimentation and feedback for ML applications
  • Unifying the release cycle for ML and conventional assets
  • Enabling automated testing of ML assets
  • Application of Agile principles to ML projects
  • Supporting ML assets as first-class citizens within CI/CD systems
  • Enabling shift-left on Security to include ML assets
  • Improving quality through standardization across conventional and ML assets
  • Applying Static Analysis, Dynamic Analysis, Dependency Scanning and Integrity Checking to ML assets
  • Reducing Mean Time To Restore for ML applications
  • Reducing Change Fail Percentage for ML applications
  • Management of technical debt across ML assets
  • Enabling whole-of-life cost savings at a product level
  • Reducing overheads of IT management through economies of scale
  • Facilitating the re-use of ML approaches through a template or ‘quickstart’ projects
  • Managing risk by aligning ML deliveries to appropriate governance processes

Want to learn more? Browse my Digital Garden