Skip to content

Commit 65e3368

Browse files
authored
Update 2024-02-05-evolution-of-mlplatform.md
fix links and add details about scribds ml platform
1 parent 88cce8e commit 65e3368

File tree

1 file changed

+21
-48
lines changed

1 file changed

+21
-48
lines changed

_posts/2024-02-05-evolution-of-mlplatform.md

Lines changed: 21 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,11 @@ The idea behind technical debt is to highlight the consequences of prioritizing
3030
Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark google paper suggest that ML systems have the propensity to easily gain this technical debt.
3131

3232
> Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt , we find it is common to incur massive ongoing maintenance costs in real-world ML systems
33-
> /todo fix link
34-
> [https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems)
33+
> [Sculley et al (2021) Hidden Technical Debt in Machine Learning Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems)
3534
3635
> As the machine learning (ML) community continues to accumulate years of experience with livesystems, a wide-spread and uncomfortable trend has emerged: developing and deploying ML sys-tems is relatively fast and cheap, but maintaining them over time is difficult and expensive
3736
>
38-
> [https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems)
37+
> [Sculley et al (2021) Hidden Technical Debt in Machine Learning Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems)
3938
4039
Technical debt is important to consider especially when trying to move fast. Moving fast is easy, moving fast without acquiring technical debt is alot more complicated.
4140

@@ -65,19 +64,19 @@ This shift to DevOps and teams teams owning the entire development lifecycle int
6564

6665
> The total amount of mental effort a team uses to understand, operate and maintain their designated systems or tasks.
6766
>
68-
> [](https://teamtopologies.com/book "https://teamtopologies.com/book")[https://teamtopologies.com/book](https://teamtopologies.com/book)
67+
> [Skelton & Pais (2019) Team Topologies](https://teamtopologies.com/book)
6968
7069
As teams grapple with the mental effort required by adopting DevOps of understanding, operating, and maintaining systems, cognitive load becomes a barrier to efficiency. The weight of this additional load can hinder productivity, prompting organizations to seek solutions.
7170

7271
Platforms emerged as a strategic solution, delicately abstracting unnecessary details of the development lifecycle. This abstraction allows engineers to focus on critical tasks, mitigating cognitive load and fostering a more streamlined workflow.
7372

7473
> The purpose of a platform team is to enable stream-aligned teams to deliver work with substantial autonomy. The stream-aligned team maintains full ownership of building, running, and fixing their application in production. The platform team provides internal services to reduce the cognitive load that would be required from stream-aligned teams to develop these underlying services.
7574
>
76-
> [](https://teamtopologies.com/book "https://teamtopologies.com/book")[https://teamtopologies.com/book](https://teamtopologies.com/book)
75+
> [Skelton & Pais (2019) Team Topologies](https://teamtopologies.com/book)
7776
78-
> _Infrastructure Platform teams enable organisations to scale delivery by solving common product and non-functional requirements with resilient solutions. This allows other teams to focus on building their own things and releasing value for their users_
77+
> Infrastructure Platform teams enable organisations to scale delivery by solving common product and non-functional requirements with resilient solutions. This allows other teams to focus on building their own things and releasing value for their users
7978
>
80-
> \- [https://martinfowler.com/articles/building-infrastructure-platform.html](https://martinfowler.com/articles/building-infrastructure-platform.html)
79+
> [Rowse & Shepherd (2022) Building Infrastructure Platforms](https://martinfowler.com/articles/building-infrastructure-platform.html)
8180
8281
### ML Ops -- Reducing technical debt of machine learning
8382

@@ -87,66 +86,40 @@ MLOps is a methodology that provides a collection of concepts and workflows desi
8786
The Rise of Machine Learning Platform
8887
-------------------------------------
8988

90-
The paradigm shifts of DevOps, MLOps and Platform Thinking led to the emergence of Machine Learning platforms. ML platforms are the application of MLOps concepts and workflows and provide a curated developer experience for Machine Learning developers throughout the entire ML lifecycle. These platforms address the challenges of cognitive load, technical debt, quality and developer velocity and increase efficiency, collaboration, and sustainability. As the ML team grows, the benefits amplify, creating a multiplier effect that allows organizations to scale whilst maintaining quality.
89+
The paradigm shifts of DevOps, MLOps and Platform Thinking led to the emergence of Machine Learning platforms. ML platforms are the application of MLOps concepts and workflows and provide a curated developer experience for Machine Learning developers throughout the entire ML lifecycle. As the ML team grows, the benefits of a platform amplify, creating a multiplier effect that allows organizations to scale whilst maintaining quality and not getting bogged down with technical debt.
90+
9191

9292
### Scribd's ML Platform -- MLOps in Action
93-
/todo
94-
Some examples of concepts of DevOps applied to ML (aka ML Ops) are:
93+
At Scribd we have applied concepts from DevOps to our ML Operations in the following ways
9594

9695
1. **Automation:**
97-
98-
1. Automation can be applied to many parts of the machine learning lifecycle. The incorporation of automation not only streamlines processes but also addresses technical debt through the establishment of consistency and a standardized and reproducible approach.
99-
100-
2. Model deployments which can be automated by the implementation of DevOps CI/CD strategies.
101-
102-
3. Automation can also be applied to retraining of machine learning models
96+
97+
* Applying CI/CD strategies to model deployments through the use of Jenkins pipelines which deploy models from the Model Registry to AWS based endpoints.
98+
* Automating Model training throug the use of Airflow DAGS and allowing these DAGS to trigger the deployment pipelines to deploy a model once re-training has occured.
10399

104100
2. **Continuous** **Testing:**
105101

106-
* Continuous testing can be applied as part of a model deployment pipeline, removing the need for manual testing (increasing development velocity) and removing technical debt by ensuring tests are performed consistently
107-
108-
* Model validation can be automated using tooling providing consistency between training iterations.
102+
* Applying continuous testing as part of a model deployment pipeline, removing the need for manual testing.
103+
* Increased tooling to support model validation testing.
109104

110105
3. **Monitoring:**
111-
112-
* Monitoring provides key insights and a steps towards creating vital feedback loops.
113-
114-
* Monitoring can be applied to real time inference infrastructure revealing performance concerns similar to dev ops.
115106

116-
* Monitoring can be applied to Model performance and monitor for model drift in realtime, providing realtime insight and analysis to model performance and when it may need to be retrained.
107+
* Monitoring real time inference endpoints
108+
* Monitoring training DAGS
117109

118110
4. **Collaboration and Communication:**
119-
120-
* Utilize collaboration tools for effective communication and information sharing among team members.
121-
122-
* Feature Store's provides a platform for discovering, re using and collaborating on ML features
111+
112+
* Feature Store which provides feature discovery and re-use
113+
* Model Database which provides model collaboration
123114

124-
* Model Database's provide a platform for discovering, re using and collaborating on ML Models
125-
126-
5. **Version Control:**
115+
6. **Version Control:**
127116

128-
* Applying version control to experiments, machine learning models and features provides better change management and auditing of these ML artifacts
117+
* Applyied version control to experiments, machine learning models and features
129118

130119

131-
### Benefits to the Organization
132-
133-
The adoption of a Machine Learning Platform unfolds a spectrum of benefits:
134-
135-
**Increasing Flow of Change (aka developer velocity):** A swift pace in model development and deployment, enhancing overall efficiency.
136-
137-
**Fostering Collaboration Amongst Teams:** Breaking down silos and promoting cross-functional collaboration. The platform becomes the silent foundation for collaboration, facilitating a harmonious working environment.
138-
139-
**Enforcing Best Practices:** Standardizing and ensuring adherence to best practices across ML projects.
140-
141-
**Reducing/Limiting Technical Debt:** Strategically mitigating the risk of accumulating technical debt, ensuring long-term sustainability.
142-
143-
**Multiplier Effect:** As the ML team grows, these benefits of the platform amplify—a dividend that multiplies with organizational growth.
144-
145120
References
146121
----------
147122

148-
[https://www.youtube.com/watch?v=Bfhl8kcSaEI&embeds\_referring\_euri=https%3A%2F%2Fplatformengineering.org%2F&feature=emb\_imp\_woyt](https://www.youtube.com/watch?v=Bfhl8kcSaEI&embeds_referring_euri=https%3A%2F%2Fplatformengineering.org%2F&feature=emb_imp_woyt)
149-
150123
[https://www.atlassian.com/devops/frameworks/team-topologies](https://www.atlassian.com/devops/frameworks/team-topologies)
151124

152125
[https://platformengineering.org/blog/what-is-platform-engineering](https://platformengineering.org/blog/what-is-platform-engineering)

0 commit comments

Comments
 (0)