Medical and life sciences organisations are embarking on and initiatives to unlock complex data sets with the goal of preventing diseases, speeding recovery and improving patient outcomes.
Copyright by www.itproportal.com
And, financial services institutions are using these systems to bolster fraud detection efficacy, and federal governments are applying it for public data sharing to support R&D and improved public services (and the list goes on and on).
The sensitive nature of the data used in projects – including data ownership issues and regulatory requirements such as the General Data Protection Regulation (GDPR), HIPAA, financial data privacy rules, etc. – require organisations to go to great lengths to keep information private and secure. As a result, data sets that could be tremendously valuable in concert with other initiatives (or organisations) are often locked away and guarded, creating data silos. But as a variety of industries begin to spread their wings with and technology, we’re seeing a groundswell of overwhelming demand for innovative, trusted and inclusive solutions to the data collaboration problem. Organisations are asking for a way to execute algorithms on data sets from multiple parties, while ensuring that the data source is not shared or compromised, and that only the results are shared with approved parties.
A few years back, attempts were made to address this challenge by moving data to the compute mechanism. This approach involved moving data sets from various parties’ edge nodes to a centralised aggregation engine. The data was then run though the aggregation engine at a central location in a Trusted Execution Environment (TEE) – an isolated, and private execution environment within a processor such as Intel SGX – so only the output or results of the query could be shared, while the data themselves were kept private.
This “centralised data aggregation model” led to a new set of challenges. Moving data from one site to another can be a significant burden on an organisation due to the sheer size of a data set, or certain data privacy and storage regulations that simply make it impossible. Additionally, there were many data normalisation challenges that came with this approach. For example, data sets from various healthcare institutions often come in different file formats, with different fields of information that don’t match up with other parties. Without a common schema across all participating data sets, aggregation could be incredibly arduous and even impossible. Lastly, “moving data to the compute” required a tremendous amount of upfront commitment and cooperation from IT personnel at each organisation involved.
The overall goal of this early approach was to address the privacy and security problems that were so prevalent in big data collaboration projects. While it provided some benefits, it turned out to be a less than optimal method. However, it led to a new approach called “Federated Machine Learning.”
Federated Machine Learning is a distributed approach that enables model training on large bodies of decentralised data, ensuring secure, multi-party collaboration on big data projects without compromising the data of any parties involved. Google first coined the term in a paper published back in 2016, and since then the model has been the subject of active research and investment by Google, Intel and other industry leaders, as well as academia. […]