Computer Vision

Bayesian inference

  • used to integrate prior knowledge about the world with new incoming data

  • interprets probability as “degree of belief” rather than “frequency of occurrence”

  • informally

    • if

      • H represents a hypothesis about the state of the world (e.g. an object in the image) and

      • D represents available image data

    • then

      • the explanatory conditional probabilities p(H|D) and p(D|H) are related by

        p(H|D) = ( p(D|H) . p(H)) / p(D)

  • this rule can be applied iteratively/recursively for repeatedly updating assessment of a visual hypothesis as more data arrives

    • use the latest posterior as the new prior

  • example priors include

    • matter cannot just disappear but regularly becomes occluded

    • a uniform texture on a complex shape is more likely than a complex texture on a simple shape

    • rotation in 3D is a better explanation for deforming boundaries then boundary changes

  • could have no useful priors

    • may need to solve pattern recognition some vector of acquired features from a given object, to decide if a vector is consistent with a particular class

    • have to make a decision

      • same, could be a

        • correct accept (hit)

        • false accept (false alarm)

      • different, could be a

        • false reject (miss)

        • correct reject

    • there will be a cross over in the probability distribution graphs

    • receiver operating characteristic curves show the trade-offs between false rejects and false accepts as the confidence level is modified

    • detectability of a signal is measured as

  •  
    •  
      • a larger d' indicates higher decidability of the problem

    • what if C2 were much more likely than C1? we wouldn't want to assume something is C1 given a smallish x: need Bayesian probability:

      • prior probabilities: define P(C1) and P(C2) as their relative proportions

      • e.g. a verses ba is 4 times more likely in English than b

      • can do

p( Ck | x ) = ( p( x | Ck ) . p(Ck)) / p(x)

  •  
    •  
      • where

        • p( Ck | x ) is the posterior probability

 

  •  
    • classification

      • if you want to minimise the overall error you assign x to class Ck if

          P(Ck | X ) > P(Cj | x) for all j

      • could choose another classification criterion in order to trade off between false positives and false negatives

  • 3D

    • in 3D situations our goal is to maximise p(H | D), i.e. to find the most likely hypothesis (surface reconstructions) that explains the image data (reflectance)

 

Scale space

  • this is used for detecting edges at multiple scales

  • it is a plot showing the zero crossings of an image after being convolved with a linear operator

    • one of the axes (dimensions) is the width of the Gaussian linear operator

  • image pyramids

    • similar to a scale space, but

      • does sub-sampling as well as smoothing

      • only has discrete scale levels

    • these are useful for efficiently detecting edges, as can determine them at a coarser level and then have a rough idea where to look for them at the next more detailed level

  • if edge detection is performed on an image that has been smoothed, it will detect only the edges that are visible at that level of detail

    • given that we can concatenate filtering (smoothing) and filtering (edge detection) functions, we can create functions that detect edges only at a certain scale of analysis

  • zero crossings (edges) can only ever merge and hence become fewer when becoming coarser

    • no new zero crossings can be introduced by using a coarser Gaussian

    • this property is called “image causality” or fingerprint theorem”

  • the wavelet transform can be regarded as a kind of scale transform

    • wavelets take projections of image structure into zero-mean basis functions all of which are dilates, translates, and rotates of each other

    • the wavelet approximation to an image at a certain scale of analysis uses only wavelets up to a certain size, or dilation

    • the error signal which remains (the difference between the original and this approximation to it) is the input signal for the next wavelet level

    • because of orthogonality, this projection is the same as projecting the original signal onto the new level

    • since wavelets of all different sizes are self-similar, such a transform is a way of analysing the signal at a variety of scales

    • since the wavelets are dilates, translates, and rotates of each other, such a transform seeks to extract image structure in a way that may be invariant to dilation, translation, and rotation of the original image or pattern

 

 

Laplacian operator

  • e.g. right the Laplacian operator in a 3*3 array

  • detects edges in all orientations

  • is a filter

  • is insensitive to the brightness of the scene due to zero-sum property

 

Non linear operators

  • can have advantageous properties such as

    • reduced noise sensitivity

    • greater applicability for extracting features that are more complicated than edges

  • these may be built up from linear operations such as filtering to extract a particular scale of analysis or an orientation of image structure

  • the actual responsiveness to a feature such as colour or motion or texture requires the artful use of non-linearity

  • disadvantages

    • don't have translation invariance

    • output is phase modulated

  • motion information:

    • extracted by “energy detectors"

    • built by taking the squared-modulus of linear spatio-temporal filters forming a quadrature pair

  • Colour information

    • can only be extracted by “discounting the illuminant,"

      • this requires non-linear operations such as taking ratios over neighbouring regions of differently coloured surfaces

    • permits the spectral reflectance properties of surfaces themselves (e.g. their pigments) to be inferred, independently of the wavelengths of light illuminating them

  • Stereoscopic information

    • this requires dichoptic integration by solving the Correspondence Problem (matching up the corresponding points of two images acquired with some displacement disparity)

    • this requires Cooperativity Processes, which are profoundly non-linear kinds of computation

 

Quadrature pair: a pair of complex-valued filters one of which is real-valued, and the other imaginary-valued

 

Hilbert Pair

  • a pair of functions

  • the Fourier Transform of one is equivalent to the Fourier Transform of the other except that

    • the phase of all positive frequency components has been increased by π/2

    • the phase of all negative-frequency components has been decreased by π/2

 

Hilbert Transform

  • this can be computed either by

    • the phase-shifting operation in the Fourier domain described above or by

    • convolving the original image by the hyperbola,1/x.

  • modulus of the Hilbert Transform

    • useful approach for detecting facial features, extracting them, or detecting motion

    • this amounts to taking the sum of the squares of its resulting real and imaginary parts

Gabor Logons

  • these are quadrature pair wavelets

  • the family of filters which achieve the lowest possible conjoint uncertainty/minimal dispersion/variance in both space and Fourier domain

  • complex exponential multiplied by the Gaussian

  • NB that these wavelets have Fourier transforms with the same functional form but with parameters interchanged

  • these functions are mutually non-orthogonal

  • wavelet basis

    • a family of Gabor functions

    • parametrised so that they are self similar (are dilates and translates of each other)

  • these classes of wavelets can be used as the expansion basis for signals

    • because of self similarity it amounts to analysing a signal at different scales (multi-resolution analysis)

  • optimal in extracting from an image

    • what (orientation and modulation of image structure)

    • where (2D position)

  • bandpass filter

    • when x0 = 0 then only permits in frequencies within aroundof μ0

  • 2D wavelets are defined as

  • unification of the image and the Fourier domains

    • Gabor wavelets generalise these two as being opposite ends of the spectrum

    • varying the α parameter

      • the Gaussian term becomes 1 and the expansion reduces to the Fourier basis

      • the Gaussian term becomes a discrete Delta function and the expansion reduces to the pixel by pixel image basis

 

Convolution

  • combines these two functions to generate a third function h(x; y), whose value at location (x, y) is equal to the integral of the product of functions f and g after one is flipped and undergoes a relative shift by amount (x, y)

  • is used to combine two functions using each one to blur the other

  • is important for computer vision because

    • it is the basis for filtering (convolving the image with a filter kernel)

  • convolution theorem

    • convolving two functions f and g multiplies their two 2DFT's in the Fourier domain, i.e.

 

 

Correlation

  • indicates the strength and direction of a linear relationship between two functions (images)

  • i.e. it is a way of combining two functions, such as an image array and a pattern that one is searching for in the image, in order to generate a third function which would have a large peak if there were a strong correlation between the two

  • this is used in computer vision for

    • motion detection

    • texture classification

    • image segmentation

    • pattern matching/recognition

 

Filters

  • used to extract edge information

  • variable properties include

    • isotropic (circularly symmetric) or anisotropic (directional)

    • self similar (dilates of each other) or not self similar

    • separable (expressible as products of 1D functions) or not

    • size of support (number of “taps”/pixels in the kernel)

    • preferred non-linear outputs (zero crossings, phasor moduli, energy)

  • bandpass filtering

    • this is filtering an image i(x,y) so that certain frequencies are emphasised and certain others are reduced

    • the function g(x,y) determines the pass band

    • normally this is done in the Fourier domain rather than the image domain, i.e.

 

Edge detection by derivative zero crossings

  • this is a technique to find edges in an image by examining the second derivative zero crossings

  • this can be done at different scales (of blurring) defined by the constant σ

    • this is the space constant of a Gaussian

  • [▼2 Gσ(x,y)] convolved with I(x,y)

    • when this is zero then there is an edge

 

Correspondence problem

  • this is the problem of identifying corresponding regions (e.g. the same book) in two images

  • stereoscopic vision

    • from spatially displaced cameras

    • stereoscopic disparity

      • objects will project onto each other into different places in the image

      • this difference in projection between objects in the image is proportional to the distance between those objects

      • it is also proportional to the distance between the cameras

      • these differences in projection/errors indicate the distance between the objects, if the distance between the cameras is known

  • motion vision

    • from temporally displaced cameras (i.e. same place, later time)

    • disparity

      • in a similar way the same object will project into a different place in the image

      • however the error/difference in projection will be proportional to the distance it has travelled between the two frames

        • this gives us the velocity if we know the time difference between the images

  • complexity

    • since we are trying to identify the disparities of position of certain objects in the image, first we must find the objects and then identify them

    • hence with more objects it is more complex to identify and tag them all

    • the complexity varies quadratically with the number of objects/features n

      • in theory any object in the first image could be mapped with any other object in the second

        • this means that the number of possible combinations is n!

  • an approach to simplify the problem – multi scale image pyramids

  •  
    • this helps reduce complexity of finding objects in the first place

    • these are useful for efficiently detecting edges, as can determine them at a coarser level and then have a rough idea where to look for them at the next more detailed level

    • adequate alignments are found for blurred copies of the image pair

    • the process is repeated on less and less blurred version, but each time the search space is restricted more by only looking in the areas where there were values found at the coarser levels

  • another approach to simplify the problem – stochastic relaxation

    • this helps reduce the complexity of deciding how to relate (already found) objects in the two images

    • large deviation hypothesis are no longer considered once sufficient evidence has been amassed to indicate a more conservative solution

 

The main purpose of powdering one's face is to specify s and n in this expression: