If you’ve ever called on Siri or Alexa for help, or generated a self-portrait in the style of a Renaissance painter, you have interacted with deep learning, a form of artificial intelligence that extracts patterns from mountains of data to make predictions.
Copyright by www.weforum.org
Though deep learning and AI have become household terms, the breakthroughs in statistics that have fueled this revolution are less known. In a recent paper, Andrew Gelman, a statistics professor at Columbia, and Aki Vehtari, a computer science professor at Finland’s Aalto University, published a list of the most important statistical ideas in the last 50 years.
Below, Gelman and Vehtari break down the list for those who may have snoozed through Statistics 101. Each idea can be viewed as a stand-in for an entire subfield, they say, with a few caveats: science is incremental; by singling out these works, they do not mean to diminish the importance of similar, related work. They have also chosen to focus on methods in statistics and machine learning, rather than equally important breakthroughs in statistical computing, and computer science and engineering, which have provided the tools and computing power for data analysis and visualization to become everyday practical tools. Finally, they have focused on methods, while recognizing that developments in theory and methods are often motivated by specific applications.
See something important that’s missing? Tweet it at @columbiascience and Gelman and Vehtari will consider adding it to the list.
The 10 articles and books below all were published in the last 50 years and are listed in chronological order.
1. Hirotugu Akaike (1973). Information Theory and an Extension of the Maximum Likelihood Principle. Proceedings of the Second International Symposium on Information Theory.
This is the paper that introduced the term AIC (originally called An Information Criterion but now known as Akaike Information Criterion), for evaluating a model’s fit based on its estimated predictive accuracy. AIC was instantly recognized as a useful tool, and this paper was one of several published in the mid-1970s placing statistical inference within a predictive framework. We now recognize predictive validation as a fundamental principle in statistics and machine learning. Akaike was an applied statistician, who in the 1960s, tried to measure the roughness of airport runways, in the same way that Benoit Mandelbrot’s early papers on taxonomy and Pareto distributions led to his later work on the mathematics of fractals.
2. John Tukey (1977). Exploratory Data Analysis.
This book has been hugely influential and is a fun read that can be digested in one sitting. Traditionally, data visualization and exploration were considered low-grade aspects of practical statistics; the glamour was in fitting models, proving theorems, and developing the theoretical properties of statistical procedures under various mathematical assumptions or constraints. Tukey flipped this notion on its head. He wrote about statistical tools not for confirming what we already knew (or thought we knew), and not for rejecting hypotheses that we never, or should never have, believed, but for discovering new and unexpected insights from data. His work motivated advances in network analysis, software, and theoretical perspectives that integrate confirmation, criticism, and discovery.
Read more: www.weforum.org