Image Convolution is template matching done in a streamline fashion. To do template matching, we need a template image and an image to be matched upon. We sweep the template image across the full image pixel by pixel to check how similar is the template to the overlapped full image below it. After a full sweep, we return the coordinate at which the portion of the full image is most similar to the template. In Convolution, it is almost exactly the same. We have a template and a full image and we sweep the template across. Instead of returning a single position, however, we return a matrix, where each point records the measure of similarity between the template and the full image. If we think of the template as an image feature, then we can think of the matrix as a feature map of the full image where each value in the map indicates how strong is the feature showing. This is an important step in the Convolution Neural Network.
Before Convolution, image is transformed into a vector by connecting all columns into a single column. The number of parameters needed for this is ridiculous and unnecessary. We realize that a lot of image information is present locally, so we don’t have to connect the neuron to all the elements in the vector. Instead we connect neurons to inputs locally by performing convolution. In addition to greatly reducing the number of parameters needed, Convolution help the network focus on local information, an idea similar to Attention Mechanism.
Various techniques are designed to deal with edge cases in Convolution.
Padding is to pad the original image with zero’s on the edge so the sizing works out. It also prevents the information on the edge of the image from getting washed away too quickly. Stride defines how big each step is during template sweeping. It is usually set to 1 because smaller stride works better in practice.
Convolution compares image similarity by the dot product of two image vectors. Is there better way to measure such similarity? Especially between images, not vectors.
Smaller image patches make weaker assumptions on the possible template feature to be matched, thus performs better than bigger image patches.