software solutions
Computer Science » Computer Vision
Bayesian inference used to integrate prior knowledge about the world with new incoming data interprets probability as “degree of belief” rather than “frequency of occurrence” informally if H represents a hypothesis about the state of the world (e.g. an object in the image) and D represents available image data then the explanatory conditional probabilities p(H|D) and p(D|H) are related by p(H|D) = ( p(D|H) . p(H)) / p(D) this rule can be applied iteratively/recursively for repeatedly updating assessment of a visual hypothesis as more data arrives use the latest posterior as the new prior example priors include matter cannot just disappear but regularly becomes occluded a uniform texture on a complex shape is more likely than a complex texture on a simple shape rotation in 3D is a better explanation for deforming boundaries then boundary changes could have no useful priors may need to solve pattern recognition some vector of acquired features from a given object, to decide if a vector is consistent with a particular class same, could be a correct accept (hit) false accept (false alarm) different, could be a false reject (miss) correct reject there will be a cross over in the probability distribution graphs receiver operating characteristic curves show the trade-offs between false rejects and false accepts as the confidence level is modified detectability of a signal is measured as what if C2 were much more likely than C1? we wouldn't want to assume something is C1 given a smallish x: need Bayesian probability: prior probabilities: define P(C1) and P(C2) as their relative proportions e.g. a verses b – a is 4 times more likely in English than b can do p( Ck | x ) = ( p( x | Ck ) . p(Ck)) / p(x) where p( Ck | x ) is the posterior probability classification if you want to minimise the overall error you assign x to class Ck if P(Ck | X ) > P(Cj | x) for all j could choose another classification criterion in order to trade off between false positives and false negatives 3D in 3D situations our goal is to maximise p(H | D), i.e. to find the most likely hypothesis (surface reconstructions) that explains the image data (reflectance) Scale space this is used for detecting edges at multiple scales it is a plot showing the zero crossings of an image after being convolved with a linear operator one of the axes (dimensions) is the width of the Gaussian linear operator image pyramids similar to a scale space, but does sub-sampling as well as smoothing only has discrete scale levels these are useful for efficiently detecting edges, as can determine them at a coarser level and then have a rough idea where to look for them at the next more detailed level if edge detection is performed on an image that has been smoothed, it will detect only the edges that are visible at that level of detail given that we can concatenate filtering (smoothing) and filtering (edge detection) functions, we can create functions that detect edges only at a certain scale of analysis zero crossings (edges) can only ever merge and hence become fewer when becoming coarser no new zero crossings can be introduced by using a coarser Gaussian this property is called “image causality” or fingerprint theorem” the wavelet transform can be regarded as a kind of scale transform wavelets take projections of image structure into zero-mean basis functions all of which are dilates, translates, and rotates of each other the wavelet approximation to an image at a certain scale of analysis uses only wavelets up to a certain size, or dilation the error signal which remains (the difference between the original and this approximation to it) is the input signal for the next wavelet level because of orthogonality, this projection is the same as projecting the original signal onto the new level since wavelets of all different sizes are self-similar, such a transform is a way of analysing the signal at a variety of scales since the wavelets are dilates, translates, and rotates of each other, such a transform seeks to extract image structure in a way that may be invariant to dilation, translation, and rotation of the original image or pattern Laplacian operator e.g. right the Laplacian operator in a 3*3 array detects edges in all orientations is a filter is insensitive to the brightness of the scene due to zero-sum property Non linear operators can have advantageous properties such as reduced noise sensitivity greater applicability for extracting features that are more complicated than edges these may be built up from linear operations such as filtering to extract a particular scale of analysis or an orientation of image structure the actual responsiveness to a feature such as colour or motion or texture requires the artful use of non-linearity disadvantages don't have translation invariance output is phase modulated motion information: extracted by “energy detectors" built by taking the squared-modulus of linear spatio-temporal filters forming a quadrature pair Colour information can only be extracted by “discounting the illuminant," this requires non-linear operations such as taking ratios over neighbouring regions of differently coloured surfaces permits the spectral reflectance properties of surfaces themselves (e.g. their pigments) to be inferred, independently of the wavelengths of light illuminating them Stereoscopic information this requires dichoptic integration by solving the Correspondence Problem (matching up the corresponding points of two images acquired with some displacement disparity) this requires Cooperativity Processes, which are profoundly non-linear kinds of computation Quadrature pair: a pair of complex-valued filters one of which is real-valued, and the other imaginary-valued Hilbert Pair a pair of functions the Fourier Transform of one is equivalent to the Fourier Transform of the other except that the phase of all positive frequency components has been increased by π/2 the phase of all negative-frequency components has been decreased by π/2 Hilbert Transform this can be computed either by the phase-shifting operation in the Fourier domain described above or by convolving the original image by the hyperbola,1/x. modulus of the Hilbert Transform useful approach for detecting facial features, extracting them, or detecting motion this amounts to taking the sum of the squares of its resulting real and imaginary parts Gabor Logons these are quadrature pair wavelets the family of filters which achieve the lowest possible conjoint uncertainty/minimal dispersion/variance in both space and Fourier domain complex exponential multiplied by the Gaussian NB that these wavelets have Fourier transforms with the same functional form but with parameters interchanged these functions are mutually non-orthogonal wavelet basis a family of Gabor functions parametrised so that they are self similar (are dilates and translates of each other) these classes of wavelets can be used as the expansion basis for signals because of self similarity it amounts to analysing a signal at different scales (multi-resolution analysis) optimal in extracting from an image what (orientation and modulation of image structure) where (2D position) bandpass filter when x0 = 0 then only permits in frequencies within around 2D wavelets are defined as unification of the image and the Fourier domains Gabor wavelets generalise these two as being opposite ends of the spectrum varying the α parameter Convolution combines these two functions to generate a third function h(x; y), whose value at location (x, y) is equal to the integral of the product of functions f and g after one is flipped and undergoes a relative shift by amount (x, y) is used to combine two functions using each one to blur the other is important for computer vision because it is the basis for filtering (convolving the image with a filter kernel) convolution theorem convolving two functions f and g multiplies their two 2DFT's in the Fourier domain, i.e. Correlation indicates the strength and direction of a linear relationship between two functions (images) i.e. it is a way of combining two functions, such as an image array and a pattern that one is searching for in the image, in order to generate a third function which would have a large peak if there were a strong correlation between the two this is used in computer vision for motion detection texture classification image segmentation pattern matching/recognition Filters used to extract edge information variable properties include isotropic (circularly symmetric) or anisotropic (directional) self similar (dilates of each other) or not self similar separable (expressible as products of 1D functions) or not size of support (number of “taps”/pixels in the kernel) preferred non-linear outputs (zero crossings, phasor moduli, energy) bandpass filtering this is filtering an image i(x,y) so that certain frequencies are emphasised and certain others are reduced the function g(x,y) determines the pass band normally this is done in the Fourier domain rather than the image domain, i.e. Edge detection by derivative zero crossings this is a technique to find edges in an image by examining the second derivative zero crossings this can be done at different scales (of blurring) defined by the constant σ this is the space constant of a Gaussian [▼2 Gσ(x,y)] convolved with I(x,y) when this is zero then there is an edge Correspondence problem this is the problem of identifying corresponding regions (e.g. the same book) in two images stereoscopic vision from spatially displaced cameras stereoscopic disparity objects will project onto each other into different places in the image this difference in projection between objects in the image is proportional to the distance between those objects it is also proportional to the distance between the cameras these differences in projection/errors indicate the distance between the objects, if the distance between the cameras is known motion vision from temporally displaced cameras (i.e. same place, later time) disparity in a similar way the same object will project into a different place in the image however the error/difference in projection will be proportional to the distance it has travelled between the two frames this gives us the velocity if we know the time difference between the images complexity since we are trying to identify the disparities of position of certain objects in the image, first we must find the objects and then identify them hence with more objects it is more complex to identify and tag them all the complexity varies quadratically with the number of objects/features n in theory any object in the first image could be mapped with any other object in the second this means that the number of possible combinations is n! an approach to simplify the problem – multi scale image pyramids this helps reduce complexity of finding objects in the first place these are useful for efficiently detecting edges, as can determine them at a coarser level and then have a rough idea where to look for them at the next more detailed level adequate alignments are found for blurred copies of the image pair the process is repeated on less and less blurred version, but each time the search space is restricted more by only looking in the areas where there were values found at the coarser levels another approach to simplify the problem – stochastic relaxation this helps reduce the complexity of deciding how to relate (already found) objects in the two images large deviation hypothesis are no longer considered once sufficient evidence has been amassed to indicate a more conservative solution The main purpose of powdering one's face is to specify s and n in this expression:
have to make a decision

a larger d' indicates higher decidability of the problem

![]()
of μ0
![]()
the Gaussian term becomes 1 and the expansion reduces to the Fourier basis
the Gaussian term becomes a discrete Delta function and the expansion reduces to the pixel by pixel image basis![]()
![]()
![]()
![]()
![]()