A trend we’ve been tracking for several years now is how the data science profession has steered away from being entirely independent, do-it-all unicorns into a more specialized work.
Copyright by www.analyticsinsight.net
It’s by no means that individuals with deep knowledge in several domains have disappeared, but rather, the need for data science has grown, and teams have increased in headcount. In larger groups and overall in a more active job market, there’s more room for specialization. It’s not just that there are more cooks in the kitchen, but also machine learning solutions are much more ambitious in scope.
It’s becoming more important to think about the competencies of a team rather than expecting every individual to be an expert at everything related to machine learning. This is very similar to software engineering roles diverging into backend, front-end, and DevOps engineers.
The Three Main Roles in a Machine Learning Team
Machine learning systems tend to have the following three different types of contributors:
- Data Engineer
- Data Scientist
- ML Engineer
Each of them focuses on a different part of the machine learning system. Naturally, there is overlap between each role, and we can identify a few critical parts of the system where these roles tend to collaborate the most.
Data is the foundation of machine learning, but data became a hot topic before machine learning had its relatively recent resurgence. Data engineers have been tasked with building data infrastructure for various other applications, such as business intelligence, for years, and it’s rather evident that their competencies would be needed for the adoption of machine learning.
So what does it mean that they build data infrastructure? In simple terms, they create systems that ingest, store, transform and distribute data. Exact terms depend entirely on what type of use case and data they are dealing with, for example, whether a data warehouse or a data lake is the right solution.
Data engineers interface with data scientists around issues of data. The most common topic would likely be the availability of it. A data scientist will need to have access to data to experiment and train a model, and the data engineer is there to facilitate that.
Feature stores: the intersection of data engineering and data science
More recently, feature stores have emerged as a solution between data engineering and data science. Feature stores consolidate data for machine learning into a single place and allow data scientists to define data transformations that distill information to highly valuable signals for the ML model (i.e., features). From the feature store, features can be delivered to the training pipeline and production without interruptions.
Data scientists are tasked with finding data-driven solutions to business problems. For example, they might be looking at user data to find meaningful user segments and building models that can classify those users into segments to differentiate the end-user experience and drive more engagement.
While the primary purpose of a data scientist is to explore data and build models, cleaning and wrangling data tends to be the most time-consuming part of their workflow. This is why the feature store is emerging as a significant part of end-to-end ML infrastructure. […]
Read more: www.analyticsinsight.net