Microsoft open-source SynapseML for developing AI pipelines
Hear from CIOs, CTOs, and other senior executives and leaders on data and AI strategies at the Future of Work Summit on January 12, 2022. Learn more
Microsoft today announced the Release from SynapseML (formerly MMLSpark), an open source library designed to simplify the creation of machine learning pipelines. With SynapseML, developers can create “scalable and intelligent” systems to solve problems in all areas, including text analysis, translation and speech processing, according to Microsoft.
“Over the past five years, we have worked to improve and stabilize the SynapseML library for production workloads. Developers using Azure Synapse Analytics will be happy to hear that SynapseML is now generally available on this service with corporate support. [on Azure Synapse Analytics]Microsoft software engineer Mark Hamilton wrote in a blog post.
Building machine learning pipelines can be difficult, even for the most seasoned developer. For starters, composing tools from different ecosystems requires considerable code, and many frameworks are not designed with server clusters in mind.
Despite this, data science teams are under increasing pressure to use more machine learning models. As AI adoption and analytics continue to increase, an estimated 87% of data science projects never make it into production. According to the recent Algorithmia study investigation, 22% of companies take between one and three months to deploy a model so that it can generate business value, while 18% take more than three months.
SynapseML aims to meet the challenge by unifying existing machine learning frameworks and algorithms developed by Microsoft into one API, usable on Python, R, Scala and Java. SynapseML allows developers to combine frameworks for use cases that require multiple frameworks, such as building search engines, while training and evaluating models on resizable computer clusters.
As Microsoft explains about the project website, SynapseML extends Apache Spark, the open source engine for large-scale data processing, in several new directions. “[The tools in SynapseML] enable users to create powerful and highly scalable models that span multiple [machine learning] ecosystems. SynapseML also brings new networking capabilities to the Spark ecosystem. With the HTTP on Spark project, users can integrate any web service into their SparkML models and use their Spark clusters for massive networking workflows.
SynapseML also allows developers to use models from different machine learning ecosystems through the Open Neural Network Exchange (ONNX), a framework and runtime environment co-developed by Microsoft and Facebook. With the integration, developers can run a variety of classic and machine learning models with just a few lines of code.
Beyond that, SynapseML introduces new personalized recommendation algorithms and contextual learning by reinforcing bandits using the Vowpal Wabbit framework, an open source machine learning system library originally developed at Yahoo! Research. In addition, the API provides capabilities for “responsible unsupervised AI”, including tools for understanding data set imbalance (for example, whether characteristics of “sensitive” data sets such as race or gender are over-represented or under-represented) without the need for labeled training data and explainability of scorecards that explain why models make certain predictions and how to improve training data sets.
When there are no labeled data sets, unsupervised learning – also known as self-supervised learning – can help fill knowledge gaps in the domain. For example, the recently announced SEER by Facebook, an unsupervised model, trained on a billion images to achieve cutting-edge results on a range of computer vision benchmarks. Unfortunately, unsupervised learning does not eliminate the potential for bias or flaws in system predictions. Some connoisseurs theorize that removing these biases might require specialized training of unsupervised models with additional smaller data sets organized to ‘unlearn’ the biases.
“Our goal is to free developers from the hassle of worrying about the details of the distributed implementation and allow them to deploy them in a variety of databases, clusters and languages without having to modify their code,” said Hamilton continued.
VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member