We now live in a world that’s becoming more data-driven every day. Organizations across a wide range of industries are using () and () technologies to tap into complex data sets, unearth valuable insights and drive innovation.
From healthcare and government to the financial sector and beyond, advanced data science models and big data projects are unlocking insights that can deliver everything from novel approaches to preventing and treating disease to highly effective financial fraud detection and more.
But these projects aren’t without their challenges. Organizations looking to embark on data collaboration initiatives must overcome obstacles such as data ownership issues, compliance requirements for a variety of regulations and more. In today’s data-filled world, ensuring privacy and security is paramount, and the measures to which organizations must go to achieve this can make collaborative data science difficult. The potential consequences of sustaining any kind of privacy or security breach (noncompliance, fines, reputational damage, etc.) can cause organizations to shy away from sharing data sets that could spark the next life-saving medical treatment or momentous public service program.
Solving Big Data Collaboration Problems
Luckily, organizations across many industries are recognizing just how much upside we’re leaving on the table if valuable data sets remain siloed. As such, they’re advocating for new approaches to running algorithms on data from various parties that can prevent the sources from being compromised by or shared with outside entities.
An early approach that attempted to solve these issues came in the form of a centralized data aggregation model. This involves migrating each collaborator’s data sets to a single aggregation engine in a private, inaccessible execution environment within a processor. The intent here was to ensure that each party’s data sets remained private and that only the results of the query could be shared. Unfortunately, this approach comes with numerous challenges – from unmanageable data set sizes to incongruent file formats among participating parties that can make aggregation untenable and more.
Needless to say, it quickly became evident that this early attempt at solving the world’s major data collaboration problems had to be improved upon. That’s where “Federated Machine Learning” comes in.
What is Federated Machine Learning?
A distributed method first introduced by Google about five years ago, Federated Machine Learning offers tremendous advantages when it comes to privately and securely enabling model training against large pools of data from multiple entities. It takes the opposite approach of the previous technique, meaning a Federated Machine Learning will bring aggregation to the data sources, rather than requiring all participating organizations to move their data sets to a centralized compute environment for aggregation. Processing takes place onsite at each individual organization’s location and only the query results are delivered back to the core compute environment where the collective model is then updated.[…]