## Classification of Images

Classification is a
computational procedure that sorts images into groups ("classes")
according to their similarities. Images can be similar in all kinds of
ways, but in EM-related image processing we use a very strict measure
of similarity that is based on a pixel-by-pixel comparison: the mean
squared difference, a.k.a. *generalized Euclidean distance*.

Images represented by N x N arrays of density values can be thought of
as points in an N x N-dimensional space. Points that are close to each
other in that space represent images that are "similar" since the mean
squared difference between their pixel values is small

There are two different approaches to classification: *supervised*
and *unsupervised*. Both make use of the similarity measure
introduced above,
but one (supervised) classifies a set of images according to their
similarity (speak: closeness in our high-dimensional space) with
certain pre-given images ("references" or "templates"), the other
(unsupervised) classifies the images according to their intrinsic
grouping or clustering within the set.

This is demonstrated schematically in the figure. The same set of
images, represented by a set of dots, is either classified by comparing
each image with a set of references (represented by fat dots), or by
dividing the whole cloud of dots into clusters (indicated by dashed
line).

For simplification of the analysis, or for the purpose of increasing
the signal-to-noise ratio, classification is often carried out in a
space that is of much lower dimensionality than the initial N x N
space. This reduction of dimensionality is achieved by
Multivariate Data Analysis
(also known as multivariate statistical analysis).
The two most common reduction techniques used in EM are
correspondence analysis
and principal component analysis.

For further info see the classification and clustering tutorial.