No asset is more prized in today’s digital economy than data.
Copyright by www.forbes.com
It has become widespread to the point of cliche to refer to data as “the new oil.” As one recent Economist headline put it, data is “the world’s most valuable resource.”
Data is so highly valued today because of the essential role it plays in powering machine learning and artificial intelligence solutions. Training an AI system to function effectively—from Netflix’s recommendation engine to Google’s self-driving cars—requires massive troves of data.
The result has been an obsession with bigger and bigger data. He with the most data can build the best AI, according to the prevailing wisdom. Incumbents from IBM to General Electric are racing to re-brand themselves as “data companies.” SoftBank’s Vision Fund—the largest and most influential technology investor in the world—makes no secret of the fact that its focus when looking for startups to back is data assets. “Those who rule data will rule the world,” in the words of SoftBank leader Masayoshi Son.
As the business and technology worlds increasingly orient themselves around data as the ultimate kingmaker, too little attention is being paid to an important reality: the future of AI is likely to be far less data-intensive.
At the frontiers of artificial intelligence, various efforts are underway to develop improved forms of AI that do not require massive labeled datasets. These technologies will reshape our understanding of AI and disrupt the business landscape in profound ways. Industry leaders would do well to pay attention.
Today, in order to train deep learning models, practitioners must collect thousands, millions or even billions of data points. They must then attach labels to each data point, an expensive and generally manual process. What if researchers didn’t need to laboriously collect and label data from the real world, but instead could create the exact dataset they needed from scratch?
Leading technology companies—from established competitors like Nvidia to startups like Applied Intuition—are developing methods to fabricate high-fidelity data, completely digitally, at next to no cost. These artificially created datasets can be tailored to researchers’ precise needs and can include billions of alternative scenarios.
“It’s very expensive to go out and vary the lighting in the real world, and you can’t vary the lighting in an outdoor scene,” said Mike Skolones, director of simulation technology at Nvidia. But you can with synthetic data.
As synthetic data approaches real-world data in accuracy, it will democratize AI, undercutting the competitive advantage of proprietary data assets. If a company can quickly generate billions of miles of realistic driving data via simulation, how valuable are the few million miles of real-world driving data that Waymo has invested a decade to collect? In a world in which data can be inexpensively generated on demand, the competitive dynamics across industries will be upended.
As AI gets smarter in the years to come it is likely to require less data, not more. […]