The past decade has seen a burst of algorithms and applications in machine learning especially deep learning. Behind the burst of these deep learning algorithms and applications are a wide variety of deep learning tools and frameworks.
They are the scaffolding of the machine learning revolution: the widespread adoption of deep learning frameworks like TensorFlow and PyTorch enabled many ML practitioners to more easily assemble models using well-suited domain-specific languages and a rich collection of building blocks.
Looking back at the evolution of deep learning frameworks we can clearly see a tightly coupled relationship between deep learning frameworks and deep learning algorithms. These virtuous cycle of interdependency propels a rapid development of deep learning frameworks and tools into the future.
Stone Age (early 2000s)
The concept of neural networks have been around for a while. Before the early 2000s, there were a handful of tools that can be used to describe and develop neural networks. These tools include MATLAB, OpenNN, and Torch etc. They are either not tailored specifically for neural network model development or having complex user APIs and lack of GPU support. During this time, ML practitioners had to do a lot of heavy lifting when using these primitive deep learning frameworks.
Bronze Age (~2012)
In 2012, Alex Krizhevsky et al. from the University of Toronto proposed a deep neural network architecture later known as AlexNet  that achieved the state-of-the-art accuracy on ImageNet dataset and outperformed the second-place contestant by a large margin. This outstanding result sparked the excitement in deep neural networks and since then various deep neural network models kept setting higher and higher record in the accuracy of ImageNet dataset.
Around this time, some early days deep learning frameworks such as Caffe, Chainer and Theano came into being. Using these frameworks, users could conveniently built complex deep neural network models such as CNN, RNN, and LSTM etc. In addition, multi-GPU training was supported in these frameworks which significantly reduced the time to train these models and enabled training large models that were not able to fit into a single GPU memory earlier. Among these frameworks, Caffe and Theano used a declarative programming style while Chainer adopted the imperative programming style. These two distinct programming styles also set two different development paths for the deep learning frameworks that were yet to come.