[visionlist] 4 Research openings - London (UK)

Andrea Cavallaro andrea.cavallaro at elec.qmul.ac.uk
Mon Apr 3 15:53:15 GMT 2006

Applications are invited for 3 PhD studentships and 1 Post Doc position
with the Digital Signal Processing and Multimedia Research Group of the
Electronic Engineering Department, Queen Mary, University of London.


Start date: as soon as a suitable candidate is found.


For informal enquiries: contact Dr Andrea Cavallaro:



** Perceptually-sensitive video encoding (PhD position, 3 years)


The development of new compression or transmission systems is driven by
the need of reducing the bandwidth and storage requirements of images
and video while increasing their perceived visual quality. Traditional
compression schemes aim at minimizing the coding residual in terms of
mean squared error (MSE) or peak signal-to-noise ratio (PSNR). This is
optimal from a purely mathematical, but not a perceptual point of view.
Ultimately, perception is the more appropriate and more relevant
benchmark. Therefore, the objective must be to define a codec that
maximizes perceived visual quality such that it produces better quality
at the same bit rate as a traditional encoder or the same visual quality
at a lower bit rate.

The research will centre on standard video coders and will exploit
semantic segmentation, human visual attention models and foveation. The
main objectives of the project are the following: (1) to develop a
perceptually-sensitive video encoding algorithm; (2) to study a model of
visual attention in video and to validate it with subjective experiments
using an eye tracker; (3) to extensively test and evaluate the improved
encoder with standard data sets. 


Additional info:



** Audio-visual face modeling (PhD position, 3 years)


The aim of this project is to develop joint audio-visual representations
of a 'talking head'. Such models have a wide area of possible
applications. The two that we will concentrate on are: (i)
Video-assisted speech enhancement. The visual cues can be used to
de-noise speech signals by identifying components within the signal that
are consistent with the facial movement (in particular mouth and jaw
shape). This could be further developed to provide noise-robust features
to the input of an audio-visual automatic speech recognition system.
(ii)  Generating a synthetic speech driven talking head. Identifying the
dependent components of facial expression and speech utterance would
enable an avatar to be animated purely by an inputted speech signal. The
flip side would of course be automatic lip reading where the computer
infers and possibly synthesises the speech signal purely from the video
images - clearly there are other potential applications here. The plan
of the project would be to use machine learning techniques (Bayesian
graphical models, Independent Component Analysis, manifold embedding) to
learn a set a joint audio-visual features. These can then be categorized
as: (1) features with strong audio-visual dependencies; (2) features
with predominantly no audio component; and (3) features with
predominantly no visual component. This distinction would then allow the
student to develop the two primary applications. Identifying audio
components that are unrelated to facial expression should allow us to
remove background noise sources. While distinguishing between audio
related facial movement and non-audio related facial movement would
provide, not only the ability to animate an avatar with a speech input
but also to have the further freedom to animate the non-speech related
expressive structure separately. Finally, structure could be
incorporated using Hidden Markov Models or another appropriate Dynamic
Bayesian Network.


Additional info:



** Multimedia signal processing (PhD position, 3 years)


One of the goals of dynamic scene analysis and understanding is to find
unusual patterns (events, interactions) in large collections of
audiovisual material. Unusual patterns may be rare events or specific
interactions. Rare events and interactions are not necessarily easy to
model or to predict. The aim of this PhD research project is to address
the scene understanding problem by exploring the use of unsupervised
dimensionality reduction by isometric mapping and machine learning.
Isometric mapping aims at finding meaningful low-dimensional structures,
representing patterns, events and interactions, hidden in their
high-dimensional observations. The features to be used will be based on
both visual and acoustic information. Acoustic information allows one to
disambiguate between events that would appear similar based on visual
information only. Furthermore, additional information will be provided
by the use of data captured by multiple sensors in order to discover
events on a larger scale than that enabled by the use of one sensor
only. The isometric mapping will generate the embedding of the data
under analysis that in turn will enable the discovering of
spatio-temporal structures corresponding to meaningful events. Data
clustering will be used to separate different events and to detect
abnormal events. In addition to the above, given the nature of the
application, privacy issues will be considered for data collection and


Additional info:



** Multi-modal object tracking in a network of audiovisual sensors
(Post-Doctoral Research Assistant, 2 years)


The aim of this project is to develop a unified scheme cooperative
multi-modal and multi-sensor tracking. The multi-sensor network will be
composed of stereo microphones coupled with omni-directional and with
pan-tilt-zoom cameras. Sound information will be used to discriminate
ambiguous visual observations as well as to extend the coverage area of
the sensors beyond the field of view of the cameras. Although single
modality as well as multi-modality trackers have achieved some success,
a number of important tracking issues remain open for enabling the
adoption of these algorithms in real-world scenarios. Among these
issues, three important inter-related problems will be addressed in this
project, namely the definition of a generic and flexible feature
representation for a target, a reliable mechanism to update the target
model based on incoming observations, and a robust multi-sensor handover
strategy. To evaluate the tracking scheme, a test corpus and its
associated ground-truth data will be created for use in the project as
well as for distribution to the research community to facilitate


Additional info:



** How to apply?


PhD applicants should follow the guidelines that can be found at  


('name of intended supervisor': Dr. Andrea Cavallaro)

Completed application forms should be returned to Theresa Willis by
email (theresa.willis at elec.qmul.ac.uk) 


Post doc applicants can find the application form at

('Job ref': 06062) 

Completed application forms should be returned to Sharon Cording by
email (sharon.cording at elec.qmul.ac.uk)  




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://visionscience.com/pipermail/visionlist/attachments/20060403/02fc6077/attachment.htm

More information about the visionlist mailing list