CNN: Object Detection

HOG Detector: Histogram of Oriented Gradients was proposed in 2005 by N. Dalal and B. Triggs. It inspired many other object detectors.

DPM Detector: Deformable Part-based Model was proposed in 2008 by P. Felzenszwalb. It won VOC-07, -08, and -09 detection challenges. It takes the divide-and-conquer approach; DPM-v5 reaches mean Average Precision (mAP) to 33.7%.

Two Stage Detectors:

RCNN Detector: first set up object candidate boxes (proposals), rescale images to same size, run a CNN model to extract features, run linear SVM model to classify whether or not an object is within each region and its object class; mAP reaches to 58.5%.

SPPNet Detector: was proposed in 2014 by He K et al. It used a Spatial Pyramid Pooling layer which greatly improves the detection speed. mAP reaches 59.2%.

Fast RCNN Detector: was proposed in 2015 by R. Girshick. It simultaneously trains a detector and a bounding box regressor. mAP reaches to 70%, and its detection speed is 200 times faster than RCNN.

Faster RCNN Detector: was proposed by Ren S et al. It was the first near-realtime deep learning detector.

Feature Pyramid networks: was proposed by Lin T.-Y et al.

One Stage Detectors:

YOLO Detector: You Only Look Once was proposed in 2015 by R. Joseph. It runs at 155 fps with mAP reaching to 52.7% for VOC07. It applies a single neural network to the image. It divides the image into regions and predicts bounding boxes and probabilities for each region simultaneously.

SSD Detector: Single Shot MultiBox Detector was proposed in 2015 by W. Liu et al. It uses a multi-reference and multi-resolution detection method. mAP reaches 76.8% for VOC07 and 74.9% for VOC12 with 59 fps.

RetinaNet Detector: was proposed in 2017 by Lin T.-Y. It uses a focal loss function. mAP reaches 59.1% for COCO.

Zou Zh, Shi Zh, Guo Y, and Ye J, IEEE: object detection in 20 years, A survey, CoRR, http://arxiv.org/abs/1905.05055, 2019.

Simonyan K., Zisserman A., “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Computer Science, arXiv, https://arxiv.org/abs/1409.1556, 2014.

Redmon, J., Divvala, S., Girshick, R., et al., “You Only Look Once: Unified, Real-Time Object Detection,” arXiv, https://arxiv.org/abs/1506.02640, 2015.

Ren, Sh, He, K., Girshick, R., Sun, J., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv, https://arxiv.org/abs/1506.01497, 2015.

He, K.,, Zhang, X., Ren, Sh.,, and Sun, J., “Deep Residual Learning for Image Recognition,”, arXiv, 1512.03385, 2015.

Liu. W., Anguelov, D., Erhan, D., et al., “SSD: Single Shot MultiBox Detector,” Computer Vision, Springer International Publishing, p.21-37, https://doi.org/10.1007%2F978-3-319-46448-0_2, 2016.