The results for the Survey on Production MLOps are out: ethical.institute/state-of-ml-2025 πππ As part of the release we have updated the interface enabling real time toggling between 2024 and 2025 data, and have refreshed a cool new code-editor theme π Check it out and share it around!! |
|  | Further insights: The results are in for 2025 ETL Orchestration Providers in Production ML, and the data reflects what we saw this year π
- Airflow still is the top 1 choice in 2025, although slowly losing marketshare from 37% down to 32% in 2024.
- The biggest jump this year is on Databricks hitting 2nd place with 12% compared to 2% in 2024 which is a massive jump.
- The number of orgnisations with custom orchestrators dropped to 3rd place with 10% vs 13% 2024.
- Dropping from 3rd place Argo Workflows is down to 2% compared to 9% in 2024.
If you want to dive deeper you can access the full results here: https://ethical.institute/state-of-ml-2025 π₯ |
| | |
|
|
|---|
|
Karpathy 2025 Year in Review
Andrej Karpathy has put together his 2025 year in review for LLMs! Some really interesting insights: LLMs are now a production-critical platform where tiny shifts in training, evaluation, and developer tooling can result into major changes in reliability, cost, and velocity. This yearβs biggest stop purely becoming about larger models, and instead they focused on reshaped training and product stack. Some of the key advances were RL from verifiable rewards, Claude Code skyrocketed and pushed the boundries on developer tools, and vibe coding is now transforming the trade of software engineering as we know it. We also so major innovations across different modalities such as Gemini Nano-Banana taking image generation to the next level. We can be sure that next year the pace of innovation will only continue to increase! |
|
|
|---|
|
Text-to-SQL Optimization Learnings This is a really interesting retrospective review of text-to-sql GenAI systems from Vercel sharing learnings that allow them to significantly improve performance: Apparently they removed ~80% of their specialized tools and prompt scaffolding and rebuilt it as a minimal agent once they were able to set up Claude Opus 4.5 with a semantic layer. This is an interesting trend that many organisations are also experiencing, namely iterating towards initial approaches which then have to be iterated as the industry evolves these methods into more a more robust direction. In their use-case they were able to achieve 3.5x faster performance with 37% fewer tokens whilst raising success significantly. This is certainly a trend so we'll be looking forward to their next blog post showing a similar order of magnitude improvement as the ecosystem evolves
|
|
|
|---|
|
The State of MLOps 2025 Survey π₯ The results are in for 2025 ETL Orchestration Providers in Production ML, and the data reflects what we saw this year π
- Airflow still is the top 1 choice in 2025, although slowly losing marketshare from 37% down to 32% in 2024.
- The biggest jump this year is on Databricks hitting 2nd place with 12% compared to 2% in 2024 which is a massive jump.
- The number of orgnisations with custom orchestrators dropped to 3rd place with 10% vs 13% 2024.
- Dropping from 3rd place Argo Workflows is down to 2% compared to 9% in 2024.
If you want to dive deeper you can access the full results here: https://ethical.institute/state-of-ml-2025 π₯ |
|
|
|---|
|
MIT Distributed Systems MIT Distributed Systems Free Course: This is one of the best resources to sharpen your MLOps skills for 2026 π The MIT 6-824 Distributed Systems course is arguably the best online resource for distributed systems (aside from Designing Data Intensive Applications). Often the difficulty from distributed systems comes from partial failures as opposed to full failures, given "more machines" introduce new ambiguities and bottlenecks everywhere. This is a great review on scalability, fault tolerance and consistency, with an extensive deep dive into theory and practice across algorithms, concepts and techniques. Definitely a great resource to get you kicked off for 2026 and beyond! |
|
|
|---|
|
Python Data Science Handbook What better way to kick off 2026 than with a review of foundational data science concepts with a Free O'Reilly Python Data Science Handbook: This is a great resource to review some of the data science fundamentals in a practical set of jupyter notebook environments. This resource dives into basics of notebook debugging, numpy semantics / vectorization, pandas patterns for building reliable datasets, Matplotlib/Seaborn for practical EDA and various ML libraries for knowledge on useful applications in the industry. This book covers quite a broad range of the ML lifecycles including validation, hyperparameters, feature engineering, and canonical models like linear methods, trees/forests, SVMs, PCA, clustering, and mixture models - definitely a great resource to check out. |
|
|
|---|
|
Upcoming MLOps Events The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below. Conferences for 2026 coming soon! For the meantime, in case you missed our talks: |
|
|---|
| | Β |
Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 10,000 β github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Four featured libraries in the GPU acceleration space are outlined below. - Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
- CuPy - An implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it.
- Jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
- CuDF - Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
If you know of any open source and open community events that are not listed do give us a heads up so we can add them! |
|
|---|
| | Β |
As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle these challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. You can find multiple principles in the repo - some examples include the following: - MLSecOps Top 10 Vulnerabilities - This is an initiative that aims to further the field of machine learning security by identifying the top 10 most common vulnerabiliites in the machine learning lifecycle as well as best practices.
- AI & Machine Learning 8 principles for Responsible ML - The Institute for Ethical AI & Machine Learning has put together 8 principles for responsible machine learning that are to be adopted by individuals and delivery teams designing, building and operating machine learning systems.
- An Evaluation of Guidelines - The Ethics of Ethics; A research paper that analyses multiple Ethics principles.
- ACM's Code of Ethics and Professional Conduct - This is the code of ethics that has been put together in 1992 by the Association for Computer Machinery and updated in 2018.
If you know of any guidelines that are not in the "Awesome AI Guidelines" list, please do give us a heads up or feel free to add a pull request!
|
|
|---|
| | Β |
| | | | The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning. | | | | |
|
|
|---|
|
|