William Clocksin

Quantification of Genetic Information for Chromosome images

The SAFE Network is a new European-wide research project with 54 partners in 12 countries, funded at 14 million euro by the European Commission between 2004-2009. The aim of the project is to develop new methods for non-invasive prenatal detection of genetic disease. I lead the the image processing and cytomety research on this project. We are developing new methods for the automation of tests involving fetal cell markers in the maternal circulation. These methods include automated image analysis and pattern classification, and they make use of the latest developments in robust statistics.

My involvement with the SAFE Network followed from research on computer analysis of Fluorescence in-situ hybridisation (FISH) images with Maj Hulten, Boaz Lerner, Chris Bishop, Neil Lawrence, Tobias Heimann.
Fluorescence in-situ hybridisation (FISH) allows the detection of specific nucleic acid sequences in intact cells and chromosomes. It enables selective staining of certain sequences in interphase nuclei and therefore the detection, analysis and quantification of specific numerical and structural chromosomal abnormalities within these nuclei. For example, trisomy (triplication) of chromosomes 13 and 21 is associated with Patau and Down Syndromes respectively. FISH is a widespread and diversely applied technology, employed in clinical diagnosis and monitoring of disease, karyotype analysis, gene mapping, DNA replication and recombination, gene transcription and expression, and the study of chromatin organization and structure.
This project focused on a particular application to do with pre-natal genetic screening. FISH is commonly used to tag individual chromosomes, or portions thereof, with a fluorophore. In interphase cells, a dot appears for each labeled sequence. The images used for this project result from the fluorescence of three different colours of dyes: one for the cell nucleus and two for DNA hybridisation dots (for example, associated with chromosomes 13 and 21). To estimate the distribution of chromosomes per cell, it is necessary to inspect a large number of cells, particularly when there is a suspicion of constitutional mosaicism, where the frequency of abnormal cells may be low. Dot counting, the enumeration of signals (also called dots or spots) within the nuclei, is considered as one of the most important applications of FISH, yet there has been little progress in automating this task. Analysis of FISH imagery could be useful for the automation of this laborious and tedious screening task.
The principal goal of this project was to develop algorithms for classifying previously unseen FISH images as normal (exhibiting two of each labeled chromosome within a nucleus) or abnormal (exhibiting nuclei containing a number other than two of such chromosomes). I believe the image processing aspects of FISH image analysis are straightforward, and do not present serious challenges to computer scientists skilled in the field of computer vision. We therefore concentrated on the decision and classification (pattern recognition) problem. However, two issues became apparent. First, the cells that make up a FISH preparation are fixed within a three-dimensional medium, and informative FISH images therefore must be acquired at a particular focal plane. It is necessary to set the focal plane manually or automatically before dots can be viewed clearly. Recent research has considered algorithms for on-line automatic focusing of the image, but a number of shortcomings have been noted. Because on-line autofocusing was not possible with our apparatus, we decided to base dot counting on a larger population of images randomly sampled across the slide at a fixed focal plane. This method is motivated by the assumption that nuclei are approximately uniformly distributed in the sample, so that translations at a fixed focal plane provide a statistically equivalent sample as projections through different focal planes. This method overcomes most of the shortcomings of auto-focusing, but it relies on the acquisition of a sufficient number of analysable images and more intensive analysis. Dealing with many unfocused nuclei and signals, the system needs an improved discrimination capability between focused and unfocused signals. Therefore, the system we developed is based on extracting well-discriminating characteristics of focused and unfocused signals and on a highly accurate classifier, trained using large numbers of examples of the two classes.
The second issue concerned the high degree of noise in the images, caused by artifacts such as overlapping signals, background fluorescence and contaminants. Instead of taking a bottom-up approach in which artifacts and unfocused images were subjected to intensive image processing to discern their true nature, the approach of this project was to use the same trainable classifier as described above to discriminate between real signals and artifacts.
Automatic analysis of FISH images has the potential to provide genetic screening at lower cost and higher accuracy than is practiced by current manual methods. Automation of FISH image analysis and spot counting has now progressed to the point where it is possible for the private sector to develop and market software for clinical applications such as pre-natal screening for Down's syndrome and other numerical chromosome abnormalities.