DevOps, Simplified!

MLOps, Simplified!

Rajesh Dangi, June 2021

ML stands for Machine Learning, the fastest developing technology under the Artificial intelligence umbrella, expanding at an unprecedented pace with a higher adoption curve and number of deployed use cases. Whereas ‘Ops’ is Operations that manages and enables the underlying infrastructure, services, and support processes to successfully run the toolsets and application workloads. In our last interaction, we learned about Devops, and a few months ago we did touch base on Machine learning, as we discussed these two concepts individually, we must also look at how Machine Learning is leveraging DevOps framework for development, deployment, scaling, and retraining ML models, thus the name MLOps.

Why MLOps?

DevOps Model allows developers and Operation engineers to simplify the process and allows them to be more productive via automation, which reduces the number of manual actions and iterations that accelerate the development process thanks to faster end-users feedback, instant rollbacks, and quick fixes. The most important aspect of this is self-service, which helps accelerate

releases by enabling developers to deploy applications on-demand by themselves and testers perform the testing in tandem with the code ready for testing.

Certainly, the value addition by DevOps framework brings in a structured approach to data science tenets for data ingestion (collection, pre-validations, etc), processing (datasets, data modelling, and analytics, etc), and continuous improvement (Retraining data models, redeployment of the re-trained models, etc) and data delivery (monitoring data pipelines, dataset publishing, managing the ML Models/repositories, etc) thereof. The stakeholder canvas thus gets expanded to include data scientists/engineers and ML architects/engineers as well and results in ML pipeline automation that manages datasets, models, and insights through their end-to-end lifecycle.

Broadly, DevOps Model adoption for Machine learning accelerates time to market and brings a structured approach to data science and machine learning workflows fostering collaboration within all stakeholders. Typical MLOps architecture augments data science platforms where models are constructed along with respective analytical engines where computations are performed for desired insights, via MLOps tool orchestrating the movement of machine learning models, data, and interactions thereof.

Merging MLOps processes to DevOps Stages – Quick Overview

Data is the key element of modern ML techniques and the foundation for data engineering and data management thus plays an essential role in system development, deployment, management, and refinement of Data Models. Since development, quality assurance, and delivery processes are tightly integrated to work together in each of these stages, proven DevOps methodology needs active assimilations of cross-functional skills, processes, and tools as they would impact the end objective of delivery and continuous improvement of Data Models, Repositories and re-tuned datasets, etc.

Design & Discovery Stage-Planning an effective and successful project needs time for assessment, design, and planning time, the more deliberations and discovery hours spent provides more opportunities for successful insights/outcomes. This stage demands well spent time and focus on the following activities.

  • Requirement Engineering – Use cases and problem statements, scope definition & analysis.
  • Use Case prioritization – What model works for the given problem statement/use case and applicability thereof.
  • Data availability and Pre-reqs/Data validation – A pure-play assessment of the available data and datasets for validation – consistency, correctness, and completeness, etc.

Continuous Development & Continuous Testing – CD & CT - Extends the testing and validating code and components by adding testing and validating data and models and automatically retrains ML models for redeployment.

  • Data Engineering - Data extraction, validation, preparation,
  • Model Engineering & Source repository - Model training & evaluation
  • Model Testing and validation
  • Model Delivery - Model serving for prediction

Continuous Integration, Operations & Delivery – CI & CT -Concerns with the delivery of an ML training pipeline that automatically deploys another cascaded service (model prediction service) or roll back changes from a model, etc. to ensure predefined workflows are kept running for data pipelines.

  • Test & Build environments - Packages and executables for pipelines, Containerization of the ML stack, REST API, Cloud or Edge environments, etc.
  • Model Deployment – Runtimes, Feature Store & Model Registry
  • CI/CD pipelines & Metadata stores – models & parameters, training data, test data, and metrics.
  • Automation - Rapid Application Development (RAD) tools, Data, ML Model & ML training pipeline
  • Continuous Monitoring – CM - Models served in production need to be monitored regularly along with the summary statistics of data that built the model for augmenting the changes and refresh the model as and when needed. These statistics are dynamic and changes need to be notifications or a roll-back process when values deviate from the expectations etc.

    • Monitoring production data and model performance metrics, Model test scores, Success criteria, and KPIs, business matrix, etc
    • Versioning - Code, Data, and ML Model artifacts, etc

    Data & Data Model connection

    Fundamentally Data Models and the accuracy, fairness, and robustness of the associated ML model are often to improve the dataset, via means such as data cleaning, integration, and label acquisition, etc.

    To understand, monitor, measure, and improve the quality of data models, MLOps plays an important part. Since models are just as good as the data and the applied strategy if the data is inaccurate, inconsistent, and incomplete, it can easily introduce bias or influence the outcome hampering overall effectiveness.

    Thus major portion of Data scientists today focus on understanding the algorithm of the model and its outcomes, thus making incorrect inferences about the outcomes of the models is simply avoided.

    MLOps play an important role in managing the data and associated elements.


    Key Challenges of MLOps

    When it comes to MLOps, a challenge is that not all incomplete or bad data samples matter to the quality of the final ML model, when “propagating” through the ML training process, veracity and uncertainty of different input samples might have vastly different effects and requires extensive analysis of the impact of inconsistent and un-curated data in the training set on the quality of an ML model trained over such a set. As a matter of fact, simply cleaning the input data artifacts either randomly or agnostic to the ML training process might lead to a sub-optimal improvement of the downstream ML model, there are many aspects that can impact the results or intended outcomes ranging from development to deployment phase

    Development Phase

    • Dev/QA and production deployments on different environments – this impacts the runtime models thus consistency is important.
    • Tools, Libraries, and dependencies can complicate the deployments – there is a need to develop techniques, patterns, and tools for mitigating failures driven by the dynamic field of applied artificial intelligence and machine learning systems for developers for robustness and scalability in real-world settings.
    • Tracking analyzing the issues/experiments, even issue reproduction can be a challenge since input data can change dynamically in real-time.
    • ML code can become spaghetti code due to multiple iterations and code changes, there is no clean code as in the real world many believe.

    Production phase

    • Training data and live data will be different, which can trigger issues since the feature engineering pipeline might mismatch with the training/serving infrastructure, data, and even due to model drift.
    • Technology landscape may differ between development and deployment, model portability could be challenging – with constant change and development in cloud and container environments, this is a key technological risk that might be realized unless Models and codes become truly portable.
    • Scaling up and down the deployed models, skill mismatch between data engineers and DevOps engineers is reality, the upskilling of data engineers to become MLOps engineers like developers becoming full-stack developers is an imminent need of an hour.

    MLOps and way forward

    MLOps adoption is not as simple as purchasing software and deploying it based on the ‘installation manual’, it consists of multiple moving parts that need to cohesively scale and remain secure.

    Since all underlying technologies must be able to scale to the size and intensity for rendering optimum user experience.

    The feature engineering of needs and the data that can support those milestones or stay on course respecting operational constraints and complexities involved will be the true test of time.

    Since automation and innovation are aspirational goals, more and more relevant streams are getting unified under the MLOps Model and branching out broadening the scope in line with the market dynamics and there are already MLDev frameworks focusing deeper into the development stream as DevOps gets more traction and keep dealing with different dimensions of scalability— size, speed, complexity, etc.

    MLOPs ecosystem along with data engineering stream must remain tightly integrated to provide buildable, deployable, usable, reliable, and trustworthy as best as possible, This is only possible with affordable development and acquisition of capabilities, workforce readiness and capacity-building challenges, and ways to democratize the effective development, adoption, and use of associated MLOps technologies.

    Efforts must be taken to address each of these areas as well as solutions that enable an outcome-motivated system of MLOps components, What do you think?


    June 2021. Compiled from various publicly available internet sources, the author's views are personal.