About YOLO

YOLO (You Only Look Once) uses a CNN, Convolutional Neural Network to detect Objects.

The outputs we receive are the class labels and location at the same time.

The Algorithm divides images/frames into different regions. And for every region, it predicts the bounding boxes and probability of the object it is detecting. 

Then the bounding boxes are filtered with non-maximum suppression. 

Generic Implementation

A generic implementation of YOLO is as follows for an Image 

A generic implementation of YOLO is as follows for a Video feed 

Pre-Processing and Post-Processing

Before running the darknet command, It is important to process the image/video slightly, i.e. adjust its colors, adjust the brightness, etc. These parameters might hinder the identification process and produce inaccurate results.

Looking at the adjacent image, its hues and brightness might need to be adjusted for accurate prediction.  For Example, if we need to do predictions in a not-so-well lit environment, it is better to adjust the brightness beforehand. 

Similarly, we can modify the Output to suit our needs. 

The Image and Video Processing Section covers more about it.