Interpreting the Working Draft Human Genome Sequences (Dr. Bruce Ling)

While the international project to identify all the genes of the human body has completed the initial stage of the work, the draft has been presented with the following caveat: "The sequence is unfinished and does not necessarily represent the correct sequence ... The sequence may be contaminated with foreign sequence from E.coli, yeast, vector, phage etc. Order of segments is not known." DoubleTwist addresses these issues by providing a Web-based computational environment that greatly facilitates the interpretation of "working draft" HGP data. DoubleTwist proprietary masking database and algorithm have been developed to clean the HGP data from various sorts of contaminations. Furthermore, BAC framework sequences, available in the public repositories as artificially "stitched" together "contig", have been "exploded" such that analyses against individual fragments are now possible. Once a similarity hit is obtained against a BAC fragment, DoubleTWist.com performs extensive analysis of the entire "contig" associated with the similarity hit, thus providing maximum contextual information about the query sequence. DoubleTwist has also developed computational tools to infer BAC contig fragment ordering in chromosomes to provide a context and framework for analysis results. Using a sophisticated JAVA-based data mining tool, the user is able to visualize sequence similarities to known cDNAs, proteins and protein motifs, inferred gene structures, as well as an alignment of the user's query sequence against the genomic contig. In order to leverage the large amount of daily updated HGP sequence data, subscribers to http://www.DoubleTwist.com are provided with a mechanism to monitor their interested gene profiles across the human genome when relevant genomic contig has been released or updated by a newer version.