Bob Dony: Publications


Journals

Kambhatla, N., Haykin, S., Dony, R.D., `` Image compression using KLT, wavelets and an adaptive mixture of principal components model,'' Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, v 18, n 3, Apr, 1998, p 287-296

Abstract: In this paper, we present preliminary results comparing nature of the errors introduced by the mixture of principal components (MPC) model with a wavelet transform and the Karhunen Loeve transform (KLT) for the lossy compression of brain magnetic resonance (MR) images. MPC, wavelets and KLT were applied to image blocks in a block transform coding scheme. The MPC model partitions the space of image blocks into a set of disjoint classes and computes a separate KLT for each class. In our experiments, though both the wavelet transform and KLT obtained a higher peak signal to noise ratio (PSNR) than MPC, according to radiologists, MPC preserved the texture and boundaries of gray and white matter better than the wavelet transform or KLT.

Reprint | BibTeX Entry


Dony, R.D., Haykin, S., ``Compression of SAR images using KLT, VQ and mixture of principal components'' IEE Proceedings: Radar, Sonar and Navigation, v 144, n 3, Jun, 1997, p 113-120

Abstract: Owing to the very high-resolution nature of synthetic aperture radar (SAR), any use of image compression on such data must minimize the degree of distortion introduced. Two common methods for compressing images are linear block transform coding, such as the Karhunen-Loeve transform, and vector quantization. However, the implicit assumption of stationarity for these techniques is far from valid for most images. As a result, they introduce distortions in regions within an image that are significantly different from its global statistics. A new approach to data representation, referred to as the mixture of principal components (MPC), is proposed which combines advantages of both transform coding and vector quantization. Like vector quantization, it partitions the input space into a number of non-overlapping regions, and each region is represented by a number of basis vectors in the manner of transform coding. When applied to the compression of SAR images, the MPC method introduces less distortion for a given compression ratio compared with the other two techniques. For example, at 0.25 bits per pixel (a compression ratio of 64:1), the degree of distortion is reduced by close to 3 dB. When the resulting images are compared, the visibility of the distortion is also reduced when the new method is used.

Scanned Reprint | HQ Preprint | BibTeX Entry


Dony, R.D., Haykin, S., ``Image segmentation using a mixture of principal components representation,'' IEE Proceedings: Vision, Image and Signal Processing, v 144, n 2, Apr, 1997, p 73-80

Abstract: In previous work, the authors have presented a new adaptive approach to image compression using a neural network-based scheme. It is based on a mixture of principal components model for data representation. The classifier used in the adaptation is a linear subspace classifier, which the authors apply to the problem of segmentation. An important property of this classifier is its insensitivity to the norm of the input vectors. As a result, regions in an image that differ only in variations in illumination are classified the same. When trained on an image, the networks extracted perceptually important features in an entirely self-organising manner. The topological ordering of the classes resulted in like classes being close together in a manner analogous to the ordering of directionally sensitive columns in the visual cortex. The classification of similar features is consistent across an image quite different to the one used in training. In addition, the segmentation is shown to be independent of variations in illumination.

Scanned Reprint | HQ Preprint | BibTeX Entry


Dony, R.D., Haykin, S., ``Neural network approaches to image compression,'' Proceedings of the IEEE, v 83, n 2, Feb, 1995, p 288-303

Abstract: Neural networks are well suited to the problem of image compression due to their massively parallel and distributed architecture. The characteristics of these networks are analogous to some of the features of a particular visual system that allows the processing of visual information with much ease. These networks include differential pulse-code modulation (DPCM), Hebbian learning algorithms and the self-organizing feature map (SOFM). The three networks are explained in terms of their applications.

Scanned Reprint | HQ Preprint | BibTeX Entry


Conferences

Huang, Q., Dony, R.D., ``Neural Network Texture Segmentation in Equine Leg Ultrasound Images,'' accepted to Proceedings of the 2004 Canadian Conference on Electrical and Computer Engineering, May 2-5, 2004, Niagara Falls, Ontario

Abstract: In this paper, we propose a texture segmentation method based on frequency characteristics in a hybrid neural network approach using both unsupervised and supervised neural network classifiers. Our goal is to accurately and repeatedly segment out tendons from clinical ultrasound (US) images of horse tendons. The proposed method first extracts frequency-based texture features through the discrete cosine transform (DCT). A self-organizing-map (SOM) neural network is used for unsupervised classification. Following unsupervised training, a supervised neural network, learning vector quantization (LVQ), is used to further improve the performance and accuracy of segmentation. In terms of efficiency, only rotationally invariant features are adopted. The experimental results show that improvements can also be achieved by a feature selection scheme. The experimental images are all captured at the veterinary hospital. The results favourably compare to gold standards created by a radiologist.

Preprint | BibTeX Entry


Huang, Y., Dony, R.D., ``Speech Modelling By Non-Stationary Partials with Time Varying Amplitude and Frequency'' accepted to Proceedings of the 2004 Canadian Conference on Electrical and Computer Engineering, May 2-5, 2004, Niagara Falls, Ontario

Abstract: We have developed a new method for speech decomposition and modeling. The purpose of this approach is to obtain better performance for modeling speech signal corrupted by non-stationary noise. In the proposed approach, the signal is first divided into frames and then each frame is decomposed into chirp-like partials which are linearly modulated in both amplitude and frequency. The proposed Complex Ambiguity Function (CAF) based approach can successfully estimate the parameters of each partial without assuming the harmonic structure of the signal. The effectiveness of this approach shows its potential of being used in a speech enhancement system.

Preprint | BibTeX Entry


Rowlandson, R., Dony, R.D., ``Estimation of Epipolar Geometry from Homography Using Global Optical Flow'' accepted to Proceedings of the 2004 Canadian Conference on Electrical and Computer Engineering, May 2-5, 2004, Niagara Falls, Ontario

Abstract: We consider optical flow as a means for determining the fundamental matrix for video assuming small motion between frames. The mapping from one frame to a subsequent frame can be characterized by an eight parameter projective transformation or homography. We use optical flow to find this correspondence. Once found, this mapping can be used to establish the epipolar geometry, represented by the fundamental matrix. This is a basic tool in the analysis of scenes taken with two uncalibrated cameras or subsequent video frames in our case. We use SVD to restrict the rank to 2. The use of optical flow in this particular approach has not previously been well researched. However, we feel that it has a particular advantage when dealing with video because redundant information between frames can be more easily exploited. The results of this method were compared to those of other techniques for calculating the fundamental matrix and are found to be quite comparable.

Preprint | BibTeX Entry


Wang, W., Dony, R.D., ``Evaluation of Image Corner Detectors for Hardware Implementation'' accepted to Proceedings of the 2004 Canadian Conference on Electrical and Computer Engineering, May 2-5, 2004, Niagara Falls, Ontario

Abstract: In this paper, we analyze a number of corner detection algorithms and identify the advantages and disadvantages of each algorithm to evaluate their suitability for hardware implementation. We implemented three popular corner detectors, Plessy, Wang-Brady, and SUSAN, in software and compared them on the basis of their stability, accuracy, speed and computational requirements. The Plessy algorithm was found to have good stability and accuracy, but suffered from a large computational cost. The SUSAN method required the least computational resources and would therefore be suitable for implementation on a simple FPGA platform. However, it did not perform well on real world images. The Wang-Brady method was found to have better stability than SUSAN but worse than the Plessy algorithm while having a lower computational cost than Plessy and a higher cost than that for SUSAN. Despite the higher computational requirements, we conclude that the Plessy algorithm, because of its significantly better performance, is the most appropriate algorithm for hardware implementation.

Preprint | BibTeX Entry


Xu, X., Dony, R.D., ``Evaluation of Hierarchical Elastic Medical Image Registration Method'' accepted to Proceedings of the 2004 Canadian Conference on Electrical and Computer Engineering, May 2-5, 2004, Niagara Falls, Ontario

Abstract: The hierarchical approach to elastic medical image registration based on mutual information (MI) in which images are progressively subdivided, locally registered, and elastically interpolated using a thin-plate spline has been investigated. The technique has been shown to be efficient and robust with small local transformations. However, problems do exist with this technique. First, MI is a statistical property of the two images, so the reduction in the number of samples due to the partitioning of the images into smaller sub-images reduces the statistical quality of the joint intensity histogram. Also, the partitioning scheme may lose some important information such as edges which lie exactly on the partition. The statistical problem of MI is resolved by resampling and combined with global MI and an overlapping scheme is implemented in which the image is subdivided into subimages which are overlapping with its neighbours. This helps to overcome the edge problems. Experiments show that these two methods can improve the registration results to some limit extent (PSNR) and the visual result is much better especially for the overlapping window scheme.

Preprint | BibTeX Entry


Dony, R.D., Mateer, J.W., Robinson, J.A., ``Automated Reverse Storyboarding,'' Proceedings of the 1st European Conference on Visual Media Production (CVMP) pp 193-202, March 15-16, 2004, London, UK

Abstract: Storyboarding is a standard method for visual summarization of shots in film and video preproduction. Reverse storyboarding is the generation of similar visualizations from existing footage. We identify the key attributes of preproduction storyboards then develop computational techniques that extract corresponding features from video, render them appropriately, then composite them into a single storyboard image. The result succinctly represents background composition, foreground object appearance and motion, and camera motion. For tracking shots, we show that the visual representation conveys all the essential elements

Reprint | BibTeX Entry


Dony, R.D., ``A comparison of Hebbian learning methods for image compression using the mixture of principal components network,'' Proceedings of SPIE, v 3307, Applications of Artificial Neural Networks in Image Processing III, 1998, pp 64-75, Jan 26-27 1998, San Jose, CA, USA

Abstract: A number of novel adaptive image compression methods have been developed using a new approach to data representation, a mixture of principal components (MPC). MPC, together with principal component analysis (PCA) and vector quantization (VQ), form a spectrum of representations. The MPC network partitions the space into a number of regions or subspaces. Within each subspace the data are represented by the M principal components of the subspace. While Hebbian learning has been effectively used to extract principal components for the MPC, its stability is still a concern in practice. As a result, computationally more expensive methods such as batch eigendecomposition have produced more consistent results. This paper compares the performance of a number of Hebbian-based training schemes for the MPC network. These include training the entire network, network growing techniques, and a new tree-structured method. In the new tree-structured approach, each level in the tree, M, corresponds to an M-dimensional representation. A node and all its M - 1 parents represents a single M-dimensional subspace or class. The evaluation shows that the use of tree-structured approach improves training and results in reduced squared error.

Preprint | BibTeX Entry


Dony, R.D., Haykin, S., ``Optimally integrated adaptive learning,'' Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing 1993, pp I-609-I-611, Apr 27-30 1993, Minneapolis, MN, USA

Abstract: A new self-organized learning algorithm is proposed that is well suited for the problem of image compression. The network consists of a number of modules corresponding to different classes of input data. Each module consists of an orthonormal linear transformation whose weights are calculated during an initial training period. As the network is trained, each input signal x is classified according to a competitive learning scheme based on the maximum norm of the signal's projection under the class transformation. The classification is optimal in the sense that it minimizes the square error. The class transformation weights are updated according to a Hebbian learning rule which converges to the optimal KLT for each class. The performance of the resulting adaptive network is shown to be superior to that of the optimal non-adaptive linear transformation.

Scanned Reprint | HQ Preprint | BibTeX Entry


Technical Reports


Thesis

Dony, Robert D., Adaptive Transform Coding of Images Using a Mixture of Principal Components, Doctoral Thesis, McMaster University, Hamilton, ON, Canada, July 1995

Abstract: The optimal linear block transform for coding images is well known to be the Karhunen-Loeve transformation (KLT). However, the assumption of stationarity in the optimality condition is far from valid for images. Images are composed of regions whose local statistics may vary widely across an image. A new approach to data representation, a mixture of principal components (MPC), is developed in this thesis. It combines advantages of both principal components analysis and vector quantization and is therefore well suited to the problem of compressing images. The author proposes a number of new transform coding methods which optimally adapt to such local differences based on neural network methods using the MPC representation. The new networks are modular, consisting of a number of modules corresponding to different classes of the input data. Each module consists of a linear transformation, whose bases are calculated during an initial training period. The appropriate class for a given input vector is determined by an optimal classifier. The performance of the resulting adaptive networks is shown to be superior to that of the optimal nonadaptive linear transformation, both in terms of rate-distortion and computational complexity. When applied to the problem of compressing digital chest radiographs, compression ratios of between 30:1 and 40:1 are possible without any significant loss in image quality. In addition, the quality of the images were consistently judged to be as good as or better than the KLT at equivalent compression ratios.

The new networks can also be used as segmentors with the resulting segmentation being independent of variations in illumination. In addition, the organization of the resulting class representations are analogous to the arrangement of the directionally sensitive columns in the visual cortex.

Thesis | BibTeX Entry


Non-Technical


Presentations

  • Neuroprocessing: Applications in Image Compression and Audio Processing, IEEE Kitchener-Waterloo Section, August 4, 2004 Slides