Computer Vision

1Computer Vision

  • Bayesian Inference
  • Scale Space
  • Nonlinear & Hilbert Operators
  • Convolution & Correllation
  • Laplacian
  • Correspondance
  • Pattern Recognition
  • Complex Variables
  • Codons
  • Ill posed problems
  • Eigenfaces
  • Filters
  • Paradox of Cognitive Perference
  • Face recognition
  • Eye
  • Image Operators
  • Active Contours
  • Motion

  • Bayesian (4)

    • p(H|E) = Likelihood that hypothesis H is true given image evidence E. This is what we wish to evaluate, called posterior probability.

    • p(E|H) = likelihood that if hypothesis H were true, the image would contain evidence E. Based on our knowledge of the image formation process.

    • p(H) = plausability of the hypothesis, incorporating previous guesses.

    • Inverse problems as we need to figure out the state of the world that would produce the image we have. Computer vision is a form of inverse graphics because we seek to create images from configurations of model worlds, whereas vision must infer the configuration from the world given the graphical information in the image.

    • These can be related using Bayes' theorem:


    • Priors are a way of incorporating assumptions and prior knowledge into solving the problem. p(H)= prior assumptions.

    • The 3d surface reconstructed from reflectance data in an image is the H having the most likely p(H|E) given the data E. When using Bayesian inference to solve inverse problems our aim is to maximize p(H|E) to find the most likely hypothesis (surface reconstruction) that explains the image data (reflectance).

    • A reflectance map is a function Φ(i,e,g) that relates intensities in the image to surface orientations of objects. It specifies the fraction of incident light reflected per unit surface area, per unit solid angle, in the direction of the camera; thus it has units of flux/steradian. It is a function of three variables: i is the angle of the illuminant, relative to the surface normal N; e is the angle of a ray of light re-emitted from the surface; and g is the angle between the emitted ray and the illuminant.

    Scale Space

    • Pyramidal representations of image structure construct a scale space by blurring an image, subsampling it to give one level of the pyramid, then blurring this and subsampling again and again. Thus the original image is mapped into a hierarchy of images which differ in scale and image size (number of pixels divide by 4 each time). Coarser and coarser information is revealed as images get smaller.

    • By concatenating a filtering operation (eg convolution with a Gaussian1) and a differentiating operation (eg taking the Laplacian2) one constructs an edge detection operator that only finds edges at a certain scale of analysis, determined by the blurring Gaussian. Commutative3.

    • The fingerprint theorem of scale space states that as one ascends up through scale space the zero crossings of a signal can only decrease in number.

    • Since the wavelets are dilates, translates, and rotates of each other, such a transform seeks to extract image structure in a way that may be invariant to dilation, translation, and rotation of the original image or pattern.

    Non linear operators (2), Hilbert

    • Most operations, eg filters, in computer vision are non-linear. They may be built from linear operations such as filtering but responsiveness to a feature such as colour requires the use of non linearity.

    • Purely linear operations, such as filtering, are bad as they have no translation invariance and their outputs are phase modulated.

    • Linear operators perform preconditioning tasks such as filtering .

    • They are usually implemented by the convolution integral or by multiplication in the 2d fourier domain. Eg; bandpass filtering to enhance edges, then edge detection.

    • Non linear operators can't be inverted, as can linear operators. They pass a signal (image) through a stage (eg boolean operation). They are used for pattern detection.

    • Linear operators are just another version of an input signal, rather than logical. Non-linear operators are beyond theoretical analysis. They lack mathematical clarity but can be configured as a kind of "symbol-to-symbol converter".

    • The concept of "signal-to-symbol converter" is a statement of why vision is hard and cant be accomplished merely by signal processing operations. Instead, AI techniques are needed to create symbolic representations.

    • Motion information may be extracted by energy detectors which can be built by taking the squared modulus of linear spatio temporal filters forming a quadrature pair.

    • The correspondence problem is matching up the corresponding points of two images acquired with some displacement disparity. Solving correspondence problems is usually costly becuase of the combinatorial complexity of the solution space.

    • The extraction of stereoscopic information requires dichoptic integration by solving the Correspondence Problem. This requires Cooperativity Processes, which are profoundly non linear kinds of computation.

    • A quadrature pair is a pair of complex valued filters one of which is real valued, and the other imaginary valued. For example a windowing function times a sine function of space.

    • Linear operators have the properties:


    • A Hilbert Pair is a pair of functions having the property that the Fourier Transform of the other except that the phase of all positive frequency components has been increased by π/2, while that of all negative frequency components has been decreased by π/2.

    • A Hilbert Transform can be computed either by the phase shifting operation in the Fourier domain described above; or simply by convolving the original signal (or image) by the hyperbola, 1/x. A useful function for detecting facial features and detecting motion is to compute the modulus of the Hilbert Transform. This amounts to taking the sum of the square of its resulting real and imaginary parts.

    Convolution (3), Correlation, Bandpass filtering, Edge detection and Invariant transform

    • Convolution of two functions f(x,y) and g(x,y) is defined as:


      Convolution is the basis of all filtering operations. It is a way of combining two functions (eg an image array and a filter) to compute a third function (the filtered image). The operator generates all possible relative shifts (x,y) between one function and the mirror image of the other. Image processing requires filtering.

    • Correlation finds relative shifts (x,y) but without using the mirror image reflection: ie convolution but the - is replaced with a +.


      It is a way of combining two functions, such as an image array and a pattern one is searching for, to generate a third function which would have a large peak if there were a strong correlation between the two. The location of the matching pattern is revealed by the value of the relative shift that produced the peak. Used for motion detection, pattern recognition.

    • Bandpass filtering extracts information from an image I(x,y) within chosen bands of spatial frequency and orientation, which are determined by the passband of the filter g(x,y) employed. Easier to compute through multiplication in the fourier domain (u,v) than in the direct convolution domain (x,y). The spectrum of the bandpass-filtered image H(u,v) is determined by the product of the spectrum of the filter G(u,v) times the spectrum of the image I(u,v).


    • Edge detection of derivative zero crossings refers to a strategy for image understanding based on finding the edges and boundary contours of objects, by searching for the zero crossings of the second derivative of the image. This can be performed at multiple scales defined by σ, the space constant of a Gaussian, whose Laplacian ∇2 once convolved with the image, generates zero-crossings wherever edges of a certain scale occur:



      Iso-tropic second derivative of Laplacian:

      ie there is an edge if the Laplacian * Gaussian * Image =0.

    • A blurred Laplacian operator is a standard mathematical operator used to detect edges in images, at a particular scale of analysis. The Laplacian is an isotropic second-derivative and it is applied to a Gaussian low-pass filter of scale σ to build a new scale-specific operator that will detect edges when convolved with an image.

    • Invariant transform refers to the pattern representations that are the same regardless of the size, orientation and position of the pattern. For example the log-polar spectrum. The Shift Theorem shows the spectrum is independent of pattern translation in the image, the Similarity Theorem shows the dilation in size only produces uniform shift along the r axis, and the Rotation Theorem shows that changes in the orientation of the pattern only creates a uniform shift along the 0 axis. Principle used in OCR.

    Correspondence Problem

    • The correspondence problem: to infer change in position of an object relative to background need to detect correspondence of background objects between views.

    • Stereo vision requires that corresponding objects, in two images acquired from different vantage points (but same time), be associated together to make measurements of their relative disparity in the image plane and thereby a computation of their depth in space relative to the focal plane.

    • Motion vision requires that corresponding object points, in two images acquired from different moments in time (but same vantage point), be associated so their relative deplacement can be measure and a motion vector calculated.

    • The complexity of computation varies quadratically with the number of features (as every feature in one frame can be associated with every other possible feature). So N features (N may be as large as the number of pixels) generates up to NxN individual pairing hypotheses about which feature from Frame 1 goes with which feature from Frame 2. The number of possible global hypotheses is N! as there are N choices for the first paring, (N-1) for the second etc.

    • One way to make the computation more efficient is by stochastic relaxation; large displacement correspondence hypotheses no longer need to be considered once enough evidence has accumulated for a more conservative solution. A more radical approach would be a competitive winner takes all neural net.

    Pattern Recognition (4)

    • The central issue in pattern recognition is the relation between within-class variability and between class variability.

    • Ideally within-class variability is small and between-class variability large (so classes are well separated).

    • For example, when encoding faces for identity, we want different faces to generate very different face codes (between class variability should be high) but different images of the same face should generate similar codes across conditions (within class variability should be low).

    • Illumination, perspective and expression often change face codes more than identity.

    Task

    Within-Class Variability

    Between-Class Variability

    Face detection

    (classes: face / non-face)

    Bad

    Good

    Face identification

    (classes: same/different faces)

    Bad

    Good

    Facial expression interpretation

    (classes: same/different faces)

    Good

    Bad

    Facial expression interpretation

    (classes: same/different expressions)

    Bad

    Good

    • A young baby looks more alike to other babies than the adult they grow into.

    • Genotypic features are inherited. Face recognition schemes are bounded by genetic error rate (birth rate of identical twins, undermines between-class variability)

    • Phenotypic features reflect ageing and environment. There is a face recognition limit from facial appearance over time (increases within-class variability).

    Complex Variables: Fourier Transforms and Wavelets (2)

    • An example of the use of complex variables in computer vision is the Fourier transform, which is the basis for efficient implementations of filtering and pattern classification. The image I(x,y) is represented as a linear combination of complex exponentials:


    • The transform finds a set of coefficients ak for every spatial frequency and orientation in the 2D Fourier domain spanned by the 2d frequency variables (uk,vk). These coefficients may be computed by:


      As they are complex valued, they are usually resolved into polar complex form as amplitude and phase. The modulus of such coefficients (sum of squares) gives the power spectrum, which is shift-invariant.

    • A second example is the wavelet transform. This differs from the fourier transform in that the domain of description remains in the image domain. The spectral analysis is local rather than global, so one extracts a structure in the signal (ie image) at a particular scale of analysis in the (x,y) domain. Wavelet expansions are useful for edge extraction and pattern classification.

    • When the real and imaginary parts of wavelet representations are resolved into their complex polar forms as modulus and phase, the modulus is useful for pattern classification whilst the phase is useful for pattern identification.

    • Eg in a wavelet representation of a human face, the modulus detects that it is a face. However the modulus wouldn't be very useful for finding whose face it was.

    • The phase structure would be useful for identifying the face, and facial expressions can be described as phase modulations of an underlying canonical face.

    • Wavelets are good for face coding as facial features (lips,eyes etc.) are well described by just a small number of suitable wavelets. Another advantage over just edges and lines is that the major facial structure is continuous tone and differentiable, and undergoes continuous deformation, which wavelets accommodate well. A disadvantage is that they don't generate translation invariant (or size or orientation) codes, and they are 2d (image based) rather than 3d (volumetric solid based) which may be more suitable for 3d solids (heads) and project different 2d images with rotations in 3d.

    • Volumetric coordinates refers to an object-centred 3d description, eg a human form as a concatenation of generalised cylinders, rather than an appearance based 2d description of image properties.

    Codons

    • Shape descriptors such as codons or Fourier boundary descriptors encode the properties of a 2d shape boundary over a domain from 0 to 2pi. This means the same shape in different sizes always produces the same code.

    • The description is always relative to the center of the closed curve, and so invariant in two dimensions.

    • Rotating a 2d shape in the plane in an angle of the code for the shape in terms of the boundary descriptors; and in the case of codons the lexicon entry for the shape is unaffected by rotations.

    • These sort of invariances help diminish the unimportant elements of "within class variability" and give a compact description that sharpens the "between class variability".

    Super quadratics

    • Superquadratics are 3d mathematical solids having a low-dimensional parameterization. They represent 3d objects as the unions and intersections of generalized superquadratic solids defined by equations of the form:


      Examples include cubes and tomatoes.

      These simple parametric descriptions of solids, when augmented by Boolean relations for conjoining them, allow one to generate object centred volumetric descriptions of the objects in a scene.

    • Their main limitation is that any superquadratic is always a convex shape, and to build more complicated shapes you need to conjoin them which generates a cusp. Thus 3d solids represented by them is limited and look puffy, but their are an economic way of representing objects.

    Formally ill posed problems (4)

    • Most of the problems in vision are ill posed, in Hamards sense that a well posed problem must have the following properties:

      • its solution exists

      • its solution is unique

      • its solution depends continuously on the data

    • Inferring depth and 3d surface properties from image information is ill posed. An image is a two dimensional projection, but the world we want to make sense of is 3d. Vision is "inverse optics": we need to invert the 3d->2d projection in order to recover world properties; but the 2d->3d inversion of such a projection is mathematically impossible (non unique solution).

    • Inferring object colours in an illuminant manner is ill posed. The received wavelength is a composition of the illuminant and the spectral reflectance of objects. To know the spectral reflectance of the illuminant we need to know exactly the properties of the illuminant, which normally we don't (a solution doesn't exist).

    • Inferring structure from motion, shading, texture, shadows and interpreting the mutual occlusions of objects as well as their self occlusions as they rotate in depth are all ill posed. The solutions don't depend continuously on the data, and the solutions may not be unique.

    • In many ways solving computer vision is AI complete. But intractable problems can be made tractable if metaphysical priors such as "objects cannot just dissipater, the more likely occlude each other" or "head like objects are usually found on top of body like objects, so integrate both kinds of evidence together" can resolve the violation of one or more of Hamard's three criteria.

    • Bayesian priors provide one means to do this, since the learning (or specification) of metaphysical principles (truths about the nature of the world) can steer the integration of evidence appropriately, making an intractable problem soluble.

    Convolution, again

    • Convolution filters an image by the Laplacian of a Gaussian to emphasize edges of a certain scale.

    • The zero crossings of the equation, isolated points where g(x,y)=0, correspond to edges (at any angle) within the image. Thus this operator acts as an isotropic (non orientation selective) edge detector. Constant pixel values will also be regions where g(x,y)=0.

    • Parameter σ determines the scale of image analysis at which edges are detected. If its values were increased, there would be fewer edges detected.

    • In the 2d Fourier domain the operator is a bandpass filter whose centre frequency is determined by σ. Low and high frequencies are attenuated but middle frequencies (determined by σ) are emphasized. All orientations are treated equivalently: the operator is isotropic.

    • Computing the convolution entirely via Fourier methods would be simple as the computation would just be the multiplication of the two functions being convolved (the image and the Laplacian of a Gaussian filter).

    • An image domain convolution requires a double integral to be evaluated which is far more.

    • However, a Fourier computation requires the fourier transform to first be computed, then the inverse fourier transform of the result after the multiplication to recover the desired g(x,y) computation.

    • Fourier methods are favourable for convolution kernels larger than about 5x5.


    • If the image I(x,y) has 2D Fourier Transform F(u,v), provide an expression for G(u,v),the 2D Fourier Transform of the desired result g(x,y) in terms of only the Fourier plane variables u,v,F(u,v), some constants, and the above parameter σ. By application of the 2d differentiation theorem:

    • The convolution kernel (-1+1) is a finite difference approximation to the first derivative. The kernel -1+2-1 is a second finite difference, approximating the 2nd derivative. They would be applied to a row of pixels by discrete conbolution (positioning at each one, and combining the 2 or 3 nearby pixels according to weights given, to create a new pixel in an output image). The first finite difference would detect edges in a polarity sensitive way (eg positive for a +right edge but negative for a +left edge), whereas the second finite difference has teh advantage of producing a zero-crossing at an edge, regardless of its polarity.

    Eigenfaces

    • To generate a set of eigenfaces, a large set of digitized images of human faces, taken under the same lighting conditions, are normalized to line up the eyes and mouths. They are then all resampled at the same pixel resolution. Eigenfaces can be extracted out of the image data by means of a mathematical tool called principal component analysis (PCA). Here are the steps involved in converting an image of a face into eigenfaces:

    • Prepare a training set. The faces constituting the training set T should be already prepared for processing.

    • Subtract the mean. The average matrix A has to be calculated and subtracted from the original in T. The results are stored in variable S.

    • Calculate the covariance matrix.

    • Calculate the eigenvectors and eigenvalues of this covariance matrix.

    • Choose the principal components.

    • There will be a large number of eigenfaces created before step 5, and far fewer are really needed. Select from them those that have the highest eigenvalues. For instance, if we are working with a 100 x 100 image, then this system will create 10,000 eigenvectors. Since most individuals can be identified using a database with a size between 100 and 150, most of the 10,000 can be discarded, and only the most important should remain.

    • A face is represented in terms of factors precomputed from the database population of example faces through Principle Components Analysis.

    • Principle Component analysis finds the major forms of variation among the database of face images. Using linear algebraic methods (diagonalizing a covariance matrix to find its eigenvectors and their corresponding eigenvalues), a set of eigenfaces are precomputed (these are the eigenvectors). The greatest amount of variance is spanned by the smallest amount of basis vectors, the number of which are determined by the accuracy desired. Normally about 20 to 40 eigenvalues are used to be a distinguishing code for a particular face.

    • The strength of eigenfaces is that the representation for a given face is very compact and so searches can be performed quickly. As basis vectors (eigenfaces) are orthogonal, ordered bt importance, they capture the greatest amount of variability in the smallest number of terms.

    • The weaknesses are that representations are image based - ie 2d so doesn't account for pose angle or perspective angle. It has no invariance for changes in facial expression. It is very sensitive to changes in illumination and size.

    • The algorithm is efficient because the Principal Components Analysis of the database is precomputed off-line; any given presenting face then only needs to be projected onto the precomputed eigenfaces. The algorithm can learn as more faces are encountered. However, error rates of up to 50% due to changes in illumination geometry.

    Filters

    • Two kernels form a quadrature filter pair if they have a 90 degree phase offset - ie their inner product is 0.

    • Kernels are used by convolving them with an image. Positioned over each pixel in the image, the sum of the products of each tap in the filter with each pixel in the image would become the new pixel at the point in a new image: the filtered image.

    • The same result could be obtained by multiplying Fourier transform of each kernel with the fourier transform of the image.

    • Taking the modulus (sum of squares) of the result from convolving a facial image with the two kernels yields peaks of energy at locations corresponding to the eyes and mouth.

    Paradox of Cognitive Preference

    • The Paradox of Cognitive Penetrance refers to the fact that visual tasks that humans are good at (eg face recognition) are solved without our having an understanding of how we do them. In contrast, tasks for which we have an in depth theoretical understanding for such as arithmetic we are crap at.

    • The systematic illusions which occur in the human visual system suggest that fidelity to image properties is not always a goal of biological visual algorithms.

    • The significance of the Paradox of Cognitive Penetrance is that the prospects for reverse engineering human visual faculties may be reduced by the difficulty of gaining insight into how we actually do what we do. Machine visual algorithms are likely to adopt different strategies to biological ones.

    Face Recognition (5)

    • Face recognition is hard as within-class variability is large (illumination, age) but between class variability is small (faces have same basic features).

    • A major shortcoming of most algorithms to date is that they approach face recognition as 2d (eg eigenfaces) rather than building 3d models that achieve pose-invariance and perspective invariance because 3d models could be projected to 2d appearances with any illumination, pose. Also algorithms have focused on facial landmarks rather than on features of high randomness that are variable among different faces. Also data fusion over multiple frames hasn't been developed.

    • Defining a feature such as how many eyes does this face possess" would not be a good discriminator, since there is little between-class variability in that particular dimension, but choosing some secondary facial structure may be a good source of such discriminating variability.

    • Building a a 3d model from a 2d model amounts to "inverse optics". As well as the enormous computatioonal and memory requirements (up to 1GB for each 3d model) this is inherently an ill posed problem.

    Cameras

    • For an aligned stereo pair of cameras separated by distance b, each with focal length f, when a target point projects outside the central axis of the two cameras by amounts α and ß:

    • The target depth is d= f b / (α + ß)

    • Each camera has 3 spatial coordinates X,Y,Z and 3 Euler rotation angles together with a focal length. The rotation affects the solution to the Correspondence Problem.

    Texture

    • Texture information (especially texture gradients) can be used to infer 3d surface and orientation of objects, contributing to object classification. The inference of shape and orientation assumes the texture is uniform over the surface, for example a wire mesh.

    Eye (3)

    • When visual data leaves the retina down the million fibres of either optic nerve and reaches its first synapse at the thalamus it is met by a much larger flood of feedback signals coming back down from the visual cortex. This flood contains upto ten times as many fibres. Perhaps vision works by a hypothesis generation and testing process, in which graphic models are constructed in the brain about the external world and the graphics are constrained to the 2d image data coming from the retina. Hence we see not image data but 3d models constructed to be consistent with such data; this is the theory of vision as [inverse] graphics.

    • "Inverse graphics" is one way to define vision. The phrase refers to the fact that in graphics one begins with a 3d model of the world then projects down a 2d image. The task of vision is the opposite: to infter the properties of 3d objects in a 3d world from 2d images of them.

    • The fact that the cone population which subserves both colour vision and high resolution vision is numerous only near the fovea, yet the world appears uniformly coloured and uniformly resolved, reveals that our internal visual representation is built up from multiple foveated frames over time. What we see is the result of a complex graphical process that is constrained by the retinal image as a distance input.

    • Functional streaming refers to the division of labour within the mammalian brain between anatomically distinct form, colour and motion pathways.

    • The outer plexiform layer of the mammalian retina performs (centre) surround comparisons using (on centre) off-surround isotropic receptive field structures. This operation acts as a comparator that can be described as coding edges; it can also be described as a kind of bandpass filtering. The inner plexiform layer performs a similar function in time, creating sensitivity to motion of the image. Both require comparing image data from different parts of the retinal image, and so the neural signal flows are lateral.

    Image operators


    • By the commutativity of linear operators, all three above operations are equivalent. The effect is isotropic band-pass filtering of the image, extracting only its edge structure at a certain band of spatial frequencies determined by σ, and treating all orientations equally.

    A scheme for handwriting recognition

    • First the morphology of the individual letters must be extracted using edge detection operations.

    • Second the components of each letter (strokes) would be characterised by their orientation and length, and other features such as intersections which would be extracted from analysis of the skeletonised morphology.

    • A set of features x is then available for each letter. These become the input to a classifier which uses some distance metrix to determine which of the 26 letters is closest to the input data.

    • One formalism for achieving this would be a Bayesian classifier which computes a probability P(Ck|x) for each letter class Ck given the measure feature data vector x, in terms of the likelihood P(x|Ck) that the observed feature vector could have been generated by each letter class Ck:


    • The probability of the data, P(x) normalised essentially the significance of the evidence x:


    Active Contours (2)

    • The description of shapes and the detection of boundaries can be comined with constrains such as as the stiffness of the contour, or the scale of analysis that is being adopted. Evidence for local edge structure is integrated with general constrains on mathematical form, giving a best fit that minimises some energy function or other cost function.

    • So there is a data term and a cost term which are in contention. Simple models are generally more useful than complex ones.

    • When shape description is formulated in these two terms the solution is often obtained by regularisation methods. These are iterative numerical methods for finding a set of model parameters that minimise a functional that is a linear combination of two terms with some trade off parameter λ for specifying their relative importance:


      where M is the shape model, I is the image data (reduced here to a single dimension x for simplicity). The first term inside the integral seeks to minimise the squared-deviations between the model and the image data.

    Motion

    • Estimating both the local and spatial derivatives allows the velocity vector v to be inferred through the following relationship over an image sequence I(x,y,t):


    • Thus the ratio of the local image time-derivative to the spatial gradient gives an estimate of the local image velocity.

    • An alternative way to exploit measured derivatives for motion estimation is used in "Dynamic zero-crossing models" by finding the edges and contours of objects and then taking the time-derivative of the Laplacian-Gaussian-convolved image I(x,y,t):


      in the vicinity of a Laplacian zero-crossing. The amplitude of the result is an estimate of speed, and the sign of this quantity determines the direction of motion relative to the normal to the contour.

    Filters

    • A 3x3 discrete filter kernel array that approximates the Laplacian operator is:

    -1

    -2

    -1

    -2

    12

    -2

    -1

    2

    -1

     

    • The operator is used for edge detection. The sum of all taps in the filter is 0, which means there is no response to areas of uniform brightness, and very little response when there is little image structure.

    Mpeg

    • The compression achieved in .mpeg coding is both intraframe and inter frame about 50% each. A key feature that is extracted from a video sequence is motion, obviating the need to encode so many intervening frames as if they were independent, by allowing prediction of object trajectories as a mode of compression.

    Uses of Fourier analysis

    • Convolution of an image with some operator, eg edge detection. Convolution is computationally costly if done literally, but efficient if done in the Fourier domain. To do so multiple the Fourier transform of the image by the Fourier transform of the operator in question, then take the inverse Fourier transform to get the desired result. For kernels larger than 5x5 the Fourier approach is far more efficient.

    • The Fourier perspective on edge detection shows that it is really just a kind of frequency-selective filtering, usually high-pass or bandpass filtering.

    • Texture detection can be accomplished by 2d spectral Fourier analysis.

    • Motion can be detected by exploiting the "Spectral co-planarity theorem" of the 3d spatio-temporal Fourier transform.

    Mathematical similarities between eigenfaces and Fourier transforms

    • Both linear integral expressions, taking an inner product between an image and some kernel or basis function.

    • The original data can be retrieved by re-multiplying expansions by corresponding basis functions.

    • The orthogonal basis for eigenface computations consists of the principle components that emerged from a Karhunun Loeve Transform on a batabase of faces. For a Fourier transform, the orthogonal basis is a set of complex exponentials.

    • The eigenface representation does not use a universal and independent expansion basis (like the complex exponentials of the Fourier transform), but rather a data-dependent basis, that must be computed from some training data.

    1 Convolution with a Gaussian will shift the origin of the function to the position of the peak of the Gaussian, and the function will be smeared out.

    2 Sum of second derivatives in two perpendicular orientations

    3 Order of operations doesn't matter