|
Bob Dony: Publications
Journals
Kambhatla, N., Haykin, S., Dony, R.D., `` Image compression
using KLT, wavelets and an adaptive mixture of principal
components model,'' Journal of VLSI Signal Processing
Systems for Signal, Image, and Video Technology, v 18, n 3,
Apr, 1998, p 287-296
Abstract: In this paper, we present preliminary results
comparing nature of the errors introduced by the mixture of
principal components (MPC) model with a wavelet transform and
the Karhunen Loeve transform (KLT) for the lossy compression
of brain magnetic resonance (MR) images. MPC, wavelets and KLT
were applied to image blocks in a block transform coding
scheme. The MPC model partitions the space of image blocks into
a set of disjoint classes and computes a separate KLT for each
class. In our experiments, though both the wavelet transform
and KLT obtained a higher peak signal to noise ratio (PSNR) than
MPC, according to radiologists, MPC preserved the texture and
boundaries of gray and white matter better than the wavelet
transform or KLT.
|
Dony, R.D., Haykin, S., ``Compression of SAR images using
KLT, VQ and mixture of principal components'' IEE
Proceedings: Radar, Sonar and Navigation, v 144, n 3, Jun,
1997, p 113-120
Abstract: Owing to the very high-resolution nature of
synthetic aperture radar (SAR), any use of image compression on
such data must minimize the degree of distortion introduced. Two
common methods for compressing images are linear block transform
coding, such as the Karhunen-Loeve transform, and vector
quantization. However, the implicit assumption of stationarity
for these techniques is far from valid for most images. As a
result, they introduce distortions in regions within an image
that are significantly different from its global statistics. A
new approach to data representation, referred to as the mixture
of principal components (MPC), is proposed which combines
advantages of both transform coding and vector
quantization. Like vector quantization, it partitions the input
space into a number of non-overlapping regions, and each region
is represented by a number of basis vectors in the manner of
transform coding. When applied to the compression of SAR images,
the MPC method introduces less distortion for a given
compression ratio compared with the other two techniques. For
example, at 0.25 bits per pixel (a compression ratio of 64:1),
the degree of distortion is reduced by close to 3 dB. When the
resulting images are compared, the visibility of the distortion
is also reduced when the new method is used.
|
Dony, R.D., Haykin, S., ``Image segmentation using a mixture
of principal components representation,'' IEE Proceedings:
Vision, Image and Signal Processing, v 144, n 2, Apr, 1997,
p 73-80
Abstract: In previous work, the authors have presented a
new adaptive approach to image compression using a neural
network-based scheme. It is based on a mixture of principal
components model for data representation. The classifier used in
the adaptation is a linear subspace classifier, which the
authors apply to the problem of segmentation. An important
property of this classifier is its insensitivity to the norm of
the input vectors. As a result, regions in an image that differ
only in variations in illumination are classified the same. When
trained on an image, the networks extracted perceptually
important features in an entirely self-organising manner. The
topological ordering of the classes resulted in like classes
being close together in a manner analogous to the ordering of
directionally sensitive columns in the visual cortex. The
classification of similar features is consistent across an image
quite different to the one used in training. In addition, the
segmentation is shown to be independent of variations in
illumination.
|
Dony, R.D., Haykin, S., ``Neural network approaches to image
compression,'' Proceedings of the IEEE, v 83, n 2, Feb,
1995, p 288-303
Abstract: Neural networks are well suited to the problem
of image compression due to their massively parallel and
distributed architecture. The characteristics of these networks
are analogous to some of the features of a particular visual
system that allows the processing of visual information with
much ease. These networks include differential pulse-code
modulation (DPCM), Hebbian learning algorithms and the
self-organizing feature map (SOFM). The three networks are
explained in terms of their applications.
|
Conferences
Huang, Q., Dony, R.D., ``Neural Network Texture Segmentation
in Equine Leg Ultrasound Images,'' accepted to Proceedings of
the 2004 Canadian Conference on Electrical and Computer
Engineering, May 2-5, 2004, Niagara Falls, Ontario
Abstract: In this paper, we propose a texture
segmentation method based on frequency characteristics in a
hybrid neural network approach using both unsupervised and
supervised neural network classifiers. Our goal is to accurately
and repeatedly segment out tendons from clinical ultrasound (US)
images of horse tendons. The proposed method first extracts
frequency-based texture features through the discrete cosine
transform (DCT). A self-organizing-map (SOM) neural network is
used for unsupervised classification. Following unsupervised
training, a supervised neural network, learning vector
quantization (LVQ), is used to further improve the performance
and accuracy of segmentation. In terms of efficiency, only
rotationally invariant features are adopted. The experimental
results show that improvements can also be achieved by a feature
selection scheme. The experimental images are all captured at
the veterinary hospital. The results favourably compare to gold
standards created by a radiologist.
|
Huang, Y., Dony, R.D., ``Speech Modelling By Non-Stationary
Partials with Time Varying Amplitude and Frequency'' accepted to
Proceedings of the 2004 Canadian Conference on Electrical and
Computer Engineering, May 2-5, 2004, Niagara Falls, Ontario
Abstract: We have developed a new method for speech
decomposition and modeling. The purpose of this approach is to
obtain better performance for modeling speech signal corrupted
by non-stationary noise. In the proposed approach, the signal is
first divided into frames and then each frame is decomposed into
chirp-like partials which are linearly modulated in both
amplitude and frequency. The proposed Complex Ambiguity Function
(CAF) based approach can successfully estimate the parameters of
each partial without assuming the harmonic structure of the
signal. The effectiveness of this approach shows its potential
of being used in a speech enhancement system.
|
Rowlandson, R., Dony, R.D., ``Estimation of Epipolar Geometry
from Homography Using Global Optical Flow'' accepted to
Proceedings of the 2004 Canadian Conference on Electrical and
Computer Engineering, May 2-5, 2004, Niagara Falls, Ontario
Abstract: We consider optical flow as a means for
determining the fundamental matrix for video assuming small
motion between frames. The mapping from one frame to a
subsequent frame can be characterized by an eight parameter
projective transformation or homography. We use optical flow to
find this correspondence. Once found, this mapping can be used
to establish the epipolar geometry, represented by the
fundamental matrix. This is a basic tool in the analysis of
scenes taken with two uncalibrated cameras or subsequent video
frames in our case. We use SVD to restrict the rank to 2. The
use of optical flow in this particular approach has not
previously been well researched. However, we feel that it has a
particular advantage when dealing with video because redundant
information between frames can be more easily exploited. The
results of this method were compared to those of other
techniques for calculating the fundamental matrix and are found
to be quite comparable.
|
Wang, W., Dony, R.D., ``Evaluation of Image Corner Detectors
for Hardware Implementation'' accepted to
Proceedings of the 2004 Canadian Conference on Electrical and
Computer Engineering, May 2-5, 2004, Niagara Falls, Ontario
Abstract: In this paper, we analyze a number of corner
detection algorithms and identify the advantages and
disadvantages of each algorithm to evaluate their suitability
for hardware implementation. We implemented three popular corner
detectors, Plessy, Wang-Brady, and SUSAN, in software and
compared them on the basis of their stability, accuracy, speed
and computational requirements. The Plessy algorithm was found
to have good stability and accuracy, but suffered from a large
computational cost. The SUSAN method required the least
computational resources and would therefore be suitable for
implementation on a simple FPGA platform. However, it did not
perform well on real world images. The Wang-Brady method was
found to have better stability than SUSAN but worse than the
Plessy algorithm while having a lower computational cost than
Plessy and a higher cost than that for SUSAN. Despite the higher
computational requirements, we conclude that the Plessy
algorithm, because of its significantly better performance, is
the most appropriate algorithm for hardware implementation.
|
Xu, X., Dony, R.D., ``Evaluation of Hierarchical Elastic
Medical Image Registration Method'' accepted to
Proceedings of the 2004 Canadian Conference on Electrical and
Computer Engineering, May 2-5, 2004, Niagara Falls, Ontario
Abstract: The hierarchical approach to elastic medical
image registration based on mutual information (MI)
in which images are progressively subdivided, locally
registered, and elastically interpolated using a thin-plate
spline has been investigated. The technique has been shown to
be efficient and robust with small local transformations.
However, problems do exist with this technique. First, MI is a
statistical property of the two images, so the reduction in the
number of samples due to the partitioning of the images into
smaller sub-images reduces the statistical quality of the joint
intensity histogram. Also, the partitioning scheme may lose some
important information such as edges which lie exactly on the
partition. The statistical problem of MI is resolved by
resampling and combined with global MI and an overlapping scheme
is implemented in which the image is subdivided into subimages
which are overlapping with its neighbours. This helps to
overcome the edge problems. Experiments show that these two
methods can improve the registration results to some limit
extent (PSNR) and the visual result is much better especially
for the overlapping window scheme.
|
Dony, R.D., Mateer, J.W., Robinson, J.A., ``Automated Reverse
Storyboarding,'' Proceedings of the 1st European Conference
on Visual Media Production (CVMP) pp 193-202, March 15-16,
2004, London, UK
Abstract: Storyboarding is a standard method for visual
summarization of shots in film and video preproduction. Reverse
storyboarding is the generation of similar visualizations from
existing footage. We identify the key attributes of
preproduction storyboards then develop computational techniques
that extract corresponding features from video, render them
appropriately, then composite them into a single storyboard
image. The result succinctly represents background composition,
foreground object appearance and motion, and camera motion. For
tracking shots, we show that the visual representation conveys
all the essential elements
|
Dony, R.D., ``A comparison of Hebbian learning methods
for image compression using the mixture of principal components
network,'' Proceedings of SPIE, v 3307, Applications of
Artificial Neural Networks in Image Processing III, 1998, pp
64-75, Jan 26-27 1998, San Jose, CA, USA
Abstract: A number of novel adaptive image compression
methods have been developed using a new approach to data
representation, a mixture of principal components (MPC). MPC,
together with principal component analysis (PCA) and vector
quantization (VQ), form a spectrum of representations. The MPC
network partitions the space into a number of regions or
subspaces. Within each subspace the data are represented by the
M principal components of the subspace. While Hebbian learning
has been effectively used to extract principal components for
the MPC, its stability is still a concern in practice. As a
result, computationally more expensive methods such as batch
eigendecomposition have produced more consistent results. This
paper compares the performance of a number of Hebbian-based
training schemes for the MPC network. These include training the
entire network, network growing techniques, and a new
tree-structured method. In the new tree-structured approach,
each level in the tree, M, corresponds to an M-dimensional
representation. A node and all its M - 1 parents represents a
single M-dimensional subspace or class. The evaluation shows
that the use of tree-structured approach improves training and
results in reduced squared error.
|
Dony, R.D., Haykin, S., ``Optimally integrated adaptive
learning,'' Proceedings - ICASSP, IEEE International
Conference on Acoustics, Speech and Signal Processing 1993,
pp I-609-I-611, Apr 27-30 1993, Minneapolis, MN, USA
Abstract: A new self-organized learning algorithm is
proposed that is well suited for the problem of image
compression. The network consists of a number of modules
corresponding to different classes of input data. Each module
consists of an orthonormal linear transformation whose weights
are calculated during an initial training period. As the network
is trained, each input signal x is classified according to a
competitive learning scheme based on the maximum norm of the
signal's projection under the class transformation. The
classification is optimal in the sense that it minimizes the
square error. The class transformation weights are updated
according to a Hebbian learning rule which converges to the
optimal KLT for each class. The performance of the resulting
adaptive network is shown to be superior to that of the optimal
non-adaptive linear transformation.
|
Technical Reports
Thesis
Dony, Robert D., Adaptive Transform Coding of Images
Using a Mixture of Principal Components, Doctoral Thesis,
McMaster University, Hamilton, ON, Canada, July 1995
Abstract: The optimal linear block transform for coding
images is well known to be the Karhunen-Loeve transformation
(KLT). However, the assumption of stationarity in the
optimality condition is far from valid for images. Images are
composed of regions whose local statistics may vary widely
across an image. A new approach to data representation, a
mixture of principal components (MPC), is developed in this
thesis. It combines advantages of both principal components
analysis and vector quantization and is therefore well suited to
the problem of compressing images. The author proposes a number
of new transform coding methods which optimally adapt to such
local differences based on neural network methods using the MPC
representation. The new networks are modular, consisting of a
number of modules corresponding to different classes of the
input data. Each module consists of a linear transformation,
whose bases are calculated during an initial training period.
The appropriate class for a given input vector is determined by
an optimal classifier. The performance of the resulting
adaptive networks is shown to be superior to that of the optimal
nonadaptive linear transformation, both in terms of
rate-distortion and computational complexity. When applied to
the problem of compressing digital chest radiographs,
compression ratios of between 30:1 and 40:1 are possible without
any significant loss in image quality. In addition, the quality
of the images were consistently judged to be as good as or
better than the KLT at equivalent compression ratios.
The new networks can also be used as segmentors with the
resulting segmentation being independent of variations in
illumination. In addition, the organization of the resulting
class representations are analogous to the arrangement of the
directionally sensitive columns in the visual cortex.
|
Non-Technical
Presentations
- Neuroprocessing: Applications in Image Compression and
Audio Processing, IEEE Kitchener-Waterloo Section, August
4, 2004
Slides
|