Machine learning algorithms for image processing have evolved at a tremendous pace.
They can now help in the reconstruction of objects in ambiguous images, colouring old videos, detecting the depth in moving videos and much more.
One recurring theme in all these machine vision methods is teaching the model to identify patterns in images. The success of these models has a wide range of applications. Training an algorithm to differentiate between apples and oranges can eventually be used in something as grave as a cancer diagnosis or to unlock the hidden links of renaissance art.
The job of any object detector is to distinguish objects of certain target classes from backgrounds in the image with precise localisation and correct categorical label prediction to each object instance. Bounding boxes or pixel masks are predicted to localise these target object instances.
Due to the tremendous successes of deep learning-based image classification, object detection techniques using deep learning have been actively studied in recent years.
In the early stages, before the deep learning era, the pipeline of object detection was divided into three steps:
1. Proposal generation
2. Feature vector extraction
3. Region classification
Commonly, support vector machines (SVM) were used here due to their good performance on small scale training data. In addition, some classification techniques such as bagging, cascade learning and AdaBoost were used in region classification step, leading to further improvements in detection accuracy.
However, from 2008 to 2012, the progress on Pascal VOC based on these traditional methods had become incremental, with minor gains from building complicated ensemble systems. This showed the limitations of traditional detectors.
After the success of applying deep convolutional neural networks for image classification, object detection also achieved remarkable progress based on deep learning techniques.
Compared to traditional hand-crafted feature descriptors, deep neural networks generate hierarchical features and capture different scale information in different layers, and finally produce robust and discriminative features for classification. utilise the power of transfer learning.
The picture above is an Illustration of Major milestone in object detection research based on deep convolutional neural networks since 2012.
Currently, deep learning-based object detection frameworks can be primarily divided into two families:
- two-stage detectors, such as Region-based CNN (R-CNN) and its variants and
- one-stage detectors, such as YOLO and its variants.
Evolution Of Two-Stage Detectors
Two-stage detectors commonly achieve better detection performance and report state-of-the-art results on public benchmarks, while one-stage detectors are significantly more time-efficient and have greater applicability to real-time object.
R-CNN is a pioneering two-stage object detector proposed in 2014, which significantly improved the detection performance
R-CNN faces some critical shortcomings. For instance, the features of each proposal were extracted by deep convolutional networks separately (i.e., the computation was not shared), which led to heavily duplicated computations. Thus, R-CNN was extremely time-consuming for training and testing.[…]
Photo by Antonio Molinari for Unsplash