How to introduce MLOps in healthcare

The Challenge

The Chaos Gears team was contacted by the Swiss leader in the global healthcare industry. Working in both research and pharma, they discovered the opportunity that machine learning offers to their scientific teams - an approach still unique and innovative in healthcare.

Drawn to the idea of supporting their research and innovation by machine learning, the client wanted to create a secure, internal ML platform based on open source technologies. The project was initially meant to support their data science teams of around 20. These teams have compiled a vast resource of healthcare research data and the project’s goal was simple: to create a solution that will allow them to benefit from this data and further global medical research through machine learning.

‍The ML platform’s objective was to speed up their experiments, automate repeatable workflows, allow them to safely host machine learning models in a high-scale production environment and make the entire process observable, monitored and rock-solid.

What they needed was someone competent to make it happen. Never before had they attempted a ML implementation in a production environment. They had no Kubernetes and MLOps professionals on their team. Additionally, they needed a solution compliant with their high internal security standards and HIPAA regulations.

The timeframe for the project was also limited - six months from day one to final delivery.

Problems went beyond a lack of specialists on their team

The project analyzed in this machine learning case study was purely technical. The client’s plan to create a machine learning platform was research-based and didn’t stem from business needs.

They, however, faced a major problem.

The issue revolved around security. Our client already had their own Docker images that created various machine learning models and could serve as a baseline for the future platform. However, they were far from compliant with the security standards or best practices of creating similar solutions. To prevent backdoor attacks and dangerous data leaks, the new platform had to meet these standards and ensure the strongest security at all levels.

On top of this, multiple problems stemmed from the underdeveloped state of their current solution, the lack of safe and reliable integration with AWS infrastructure components and low maturity of MLOps techniques in general.

All that combined created pressure to deliver the project quickly. Rapid development was essential for success.

The Solution

The way Chaos Gears teams work aligned well with what the client expected. They looked for a highly flexible collaboration model, quick development and an easy way to take over the project. And that was exactly what we delivered to them.

The combination of these factors make us a perfect partner for the project.

Here’s how it happened

Two Chaos Gears specialists joined the client’s team to help the project succeed. Together, the entire development team consisted of ten people total, including a Product Owner.

The tasks weren’t all of the same nature. Here’s what we had to do:

Introduce the potential benefits of the platform to the company’s business departments
Create the proof of concept
Design the initial architecture
Suggest the best way to implement the solution
Devise a plan in case of changes in the concept and how to adapt to them
Cover the early development

The creation of the ML platform involved four stages over 6 months.

Research and discovery
Early development (setting milestones and implementing the environment)
Delivery of the platform in the initial form
Testing stage for a limited group of users: employees involved in and related to the project

All works and actions in each stage followed the best practices used for similar projects and AWS environments.

Here’s what we delivered

Without focusing on the technical details of this machine learning case study, we will outline the basics of our approach.

Machine learning in healthcare is a unique space that hasn’t been fully explored yet. There are only a few widely adopted open source ML platforms. That’s why it was essential to base the technological approach on deep research and our MLOps market knowledge. We went with Kubeflow as a Kubernetes native solution that provided all functionalities our client needed.

When it came to model serving we had to make some crucial choices. KFserving integrates natively and nicely with Kubeflow as both were created by the same community. However, Seldon had more sophisticated functionalities to offer - and that is what we picked.

Data science teams performed their experiments and research inside Amazon SageMaker, which was not fully-fledged at that time as it is today. However, our client enjoyed the elastic compute power it provided along with the hosted Jupyter notebooks capability.

We then had to integrate all the technologies used - Kubeflow, Seldon and Amazon SageMaker. Thanks to our planning during the initial stage of the project, we were certain that they all would function well together. And indeed they did.

Additionally, we decided to choose the blue/green methodology of deployment. There were several instances of the MLOps platform, allowing us to rapidly introduce changes to only a subset of the client’s teams. The entire platform ran on Kubernetes, Istio and Grafana Loki, providing us all the crucial metrics we needed, in real-time.

The Outcome

Despite the many challenges of this project and its limited timeframe, it turned out to be a complete success. Chaos Gears specialists ensured the results met the project criteria and the delivery was completed on schedule, milestone by milestone.

We provided the Client with the necessary foundations and competencies without which the platform couldn’t exist. Thanks to Chaos Gears' MLOps expertise, the client received a reliable, cloud-based and production-ready platform. ‍

The solution finally enabled the client’s data science teams to run their machine learning models on production in a structured, repeatable, safe and secure manner and serve their models to the outside world. The platform allowed the collaboration of 20 teams initially, but was designed to easily scale up when a new data science team was added.

Currently, the client uses their own resources to further develop the solution. They do not need external contractors either, benefitting from the experience they gained working with us.

The machine learning platform developed together with Chaos Gears continues to support our client in advancing global medical research.

The Challenge

Problems went beyond a lack of specialists on their team

The Solution

Here’s how it happened

Here’s what we delivered

The Outcome

Trusted tools

Amazon SageMaker

Kubeflow

Seldon

Kubernetes

Amazon Cognito

AWS Cloud Development Kit

AWS CodePipeline

Amazon CloudFront

Istio Mesh

Grafana Loki

Case studies

How Clariant built a generative AI platform on AWS

How KLER's e-commerce grows with cloud and DevOps best practices

Revolutionizing mental health: AI assistance in psychotherapy

Cookies