Object Detection and Distance Estimation Using Deep Learning

K Usha Rani; Dosala Srinishma; Ancha Vidisha

doi:10.32628/IJSRST523102113

Authors

K Usha Rani HOD &Associate Professor, Department of CSE, Bhoj Reddy Engineering College for Women, Hyderabad, India
Dosala Srinishma Department of CSE, Bhoj Reddy Engineering College for Women, Hyderabad, India
Ancha Vidisha Department of CSE, Bhoj Reddy Engineering College for Women, Hyderabad, India

Keywords:

Deep learning, IOT, YOLO

Abstract

Object detection is a computer vision technique for locating instances of objects in videos. When we as humans look at images or videos, we can recognize and locate objects within a matter of moments. The main goal of this project is to clone the intelligence of humans in doing that using Deep Neural Networks and IOT, Raspberry Pi and a camera. This model could be used for visually disabled people for improved navigation and crash free motion. When we consider real time scenarios, numerous objects come into a single frame. To identify different items simultaneously as they are captured, a strong model needs to be developed. YOLO (You Only Look Once) is a clever convolutional neural network (CNN) that helps in reaching that objective. The algorithm applies a single neural network to the full image, and then divides the image into regions and predicts bounding boxes and probabilities for each region. The bounding boxes are nothing but weighted by the predicted probabilities. The second objective of this model is to calculate distance of humans from the camera, to achieve that haar classifier is created and used. This classifier also helps in enhancing human detection along with distance calculation. Haar is just like a kernel in CNN where the kernel values are determined by training while in Haar they are determined manually. Whenever a person is detected by both YOLO and Haar classifier, a formula which considers height and width of human contours is applied to calculate the distance of it from the camera. As the objects are identified they will be read out using a text-to-speech engine known as gTTS(google text-to-speech) and ,which stores the text in an mp3 file. The package known as Pygame will load and play the mp3 file dynamically as the objects are detected. This developed Deep Learning model is integrated with Raspberry Pi using OpenCV. Though this project is primarily developed to aid visually disabled people, it can have various other applications such as, self-driving cars, video surveillance, pedestrian detection, face detection.

References

J. Redmon and A. Angelova, “Real-time grasp detection using convolutional neural networks,” 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
R. Girshick, “Fast R-CNN,” 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
H. Caesar, J. Uijlings, and V. Ferrari, “COCO-Stuff: Thing and Stuff Classes in Context,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning Rich Features from RGB-D Images for Object Detection and Segmentation,” Computer Vision – ECCV 2014 Lecture Notes in Computer Science, pp. 345–360, 2014.
A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
J. Xiao, K. Ramdath, M. Iosilevish, D. Sigh, and A. Takacs, “A low cost outdoor assistive navigation system for blind people,” 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA), 2013.
“TensorFlow Lite | TensorFlow,” TensorFlow. [Online]. Available: https://www.tensorflow.org/lite. [Accessed: 24-Mar-2019].
“An introduction to Text-To-Speech in Android,” Android Developers Blog, 23-Sep-2009. [Online]. Available: https://android developers.googleblog.com/2009/09/introduction-totext to-speech-in.html. [Accessed: 24-Mar-2019]

Object Detection and Distance Estimation Using Deep Learning

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite