A Basic Image Correspondence Pipeline

Definition and use

Image correspondence is the task of finding where certain parts of one image correspond to the same parts in another image. Image correspondence algorithms often are given an input of two or more images of the same artifact. The difference between two images in a correspondence could be due to translation, rotation, scaling, or a combination of all three. One of the primary applications of image correspondence is using two cameras to uncover the location of object(s) in the 3d world.

Example of Correspondence <source>

Finding feature points

The first step in the correspondence pipeline is to find our feature points. A feature is a region or point in the first image that corresponds to a region or point in the second image (like the numbered red signs in the image on the left). Each feature is described in a certain way, this description is known as a feature descriptor. Among the greatest challenges of correspondence is choosing the most effective feature points and descriptors.

In order for feature points to be effective, they must fulfill two properties: distinctiveness and invariance.

Distinctiveness ensures that the feature is unique compared to all other features within the image. So, when an algorithm finds two similar feature points in separate images, they will in fact correspond to the same point in 3D space. Distinct feature points are often found at areas where intensity of color changes rapidly, such as at the corner of an object.

Good vs Poor Feature Distinctiveness <source>

Meanwhile, invariance ensures that the feature can withstand transformations between both images such as lighting, rotational, or scaling changes. This property ensures that the same feature can be matched between images that have undergone photometric transformations. A key strategy to increase invariance is subsampling.

Effect of Subsampling

Subsampling has the effect of blurring the image. Since blurring reduces fine details, the strategy helps develop invariance to changes in scale, brightness, contrast, and some rotation as well.

Harris Corner Detector

One of the ways to find effective feature points is to use the Harris operation. To understand this operation, consider a patch over a portion of an image denoted by its intensity as I(x,y).

Now, if we were to shift this patch by a small amount (u,v), the intensity would change to I(x+u, y+v). The goal of the Harris operator is to maximize this change.

Harris Corner Detector

Assuming that both u and v are small values, we can apply the Taylor expansion to arrive at the following formula.

Taylor Series Expansion on Harris Corner Detector

The goal of Harris Corner detection is to find a patch such that the change in intensity in both the x and y direction is high (this is likely a corner). Therefore, both terms in the equation above should be nonzero and relatively large.

In most images, the Harris corner detector will yield distinct feature points. Obviously, corners of objects should serve as markers that can map points on the 2D image to the 3D world. Furthermore, by looking at changes in intensity rather than intensity itself, the detector effectively incorporates invariance by guarding against photometric changes such as brightness and contrast.

Feature Matching

With our feature detection complete, we know have a list of points that are distinct and invariant in both images. All that’s left is to match the points to each other. Among the simplest feature matching algorithms are SSD and NCC.

SSD (Sum of Squared Differences)

It is important to recognize that discriminability can never be achieved with a single pixel. It is necessary to have information of several pixels within a small locality of each other to identify a region in an image. Therefore, our feature descriptor will be a path of pixels denoted by their corresponding intensity value. The SSD algorithm will operate on this patch.

Given a patch of pixels (let’s say 5), SSD scans for the smallest difference between a patch in image 1 (x) and a patch in image 2 (y).

SSD Formula

Obviously, pixels are mathematically represented by intensity. Given two matching features, one with high brightness and one with low, the SSD approach will yield a high difference between the two. Therefore, it would fail to match the features correctly.

NCC (Normalized Cross Correlation)

NCC combats against changes in brightness and contrast between two images by normalizing both descriptors. Each point in the descriptor is operated upon by subtracting the mean and dividing by the standard deviation.

NCC Formula

Essentially, the descriptor is now a collection of points described relative to each other rather than purely based on pixel intensity. For example, different lighting conditions will definitely change the intensity of the desk in figure 1, but they are unlikely to have a large effect on the intensity of the desk relative to the laptop. This small mathematical operation guards heavily against changes in both brightness and contrast between images.

This article covered a basic pipeline of image correspondence. First, finding feature points, and then matching their descriptors using NCC or SSD. Today, there are several more advanced ways to achieve correspondence such as the MOPS (MultiScale Oriented Patches) and SIFT (Scale Invariant Feature Transform) algorithms that guard more effectively against photometric transformations and scaling.