Contig selection in physical mapping
In physical mapping, one orders a set of genetic landmarks or a library of cloned fragments of DNA according to their position in the genome. Our approach to physical mapping divides the problem into smaller and easier subproblems by partitioning the probe set into independent parts (probe contigs), For this purpose we introduce a new distance function between probes, the averaged rank distance (ARD) derived from bootstrap resampling of the raw data. The ARD measures the pairwise distances of probes within a contig and smoothes the distances of probes across different contigs. It shows distinct jumps at contig borders. This makes it appropriate for contig selection by clustering We have designed a physical mapping algorithm that makes use of these observations and seems to be particularly well suited to the delineation of reliable contigs. We evaluated our method on data sets from two physical mapping projects. On data from the recently sequenced bacterium Xylella fastidiosa, the probe contig set produced by the new method was evaluated using the probe order derived from the sequence information. Our approach yielded a basically correct contig set. On this data we also compared our method to an approach which uses the number of supporting clones to determine contigs. Our map is much more accurate. In comparison to a physical map of Pasteurella haemolytica that was computed using simulated annealing, the newly computed map is considerably cleaner. The results of our method have already proven helpful for the design of experiments aimed at further improving the quality of a map.