Classification of Images

Classification is a computational procedure that sorts images into groups ("classes") according to their similarities. Images can be similar in all kinds of ways, but in EM-related image processing we use a very strict measure of similarity that is based on a pixel-by-pixel comparison: the mean squared difference, a.k.a. generalized Euclidean distance.

Images represented by N x N arrays of density values can be thought of as points in an N x N-dimensional space. Points that are close to each other in that space represent images that are "similar" since the mean squared difference between their pixel values is small

There are two different approaches to classification: supervised and unsupervised. Both make use of the similarity measure introduced above, but one (supervised) classifies a set of images according to their similarity (speak: closeness in our high-dimensional space) with certain pre-given images ("references" or "templates"), the other (unsupervised) classifies the images according to their intrinsic grouping or clustering within the set.

This is demonstrated schematically in the figure. The same set of images, represented by a set of dots, is either classified by comparing each image with a set of references (represented by fat dots), or by dividing the whole cloud of dots into clusters (indicated by dashed line).

For simplification of the analysis, or for the purpose of increasing the signal-to-noise ratio, classification is often carried out in a space that is of much lower dimensionality than the initial N x N space. This reduction of dimensionality is achieved by Multivariate Data Analysis (also known as multivariate statistical analysis). The two most common reduction techniques used in EM are correspondence analysis and principal component analysis.

For further info see the classification and clustering tutorial.