Interpreting the Working Draft Human Genome Sequences (Dr. Bruce
Ling)
While the international project to identify all the genes of the human body has completed
the initial stage of the work, the draft has been presented with the following caveat: "The sequence is unfinished and does not necessarily represent the
correct sequence ... The sequence may be contaminated with foreign sequence from
E.coli, yeast, vector, phage etc. Order of segments is not known." DoubleTwist
addresses these issues by providing a Web-based computational environment that greatly facilitates the interpretation of "working draft" HGP data. DoubleTwist
proprietary masking database and algorithm have been developed to clean the HGP data
from various sorts of contaminations. Furthermore, BAC framework sequences, available in
the public repositories as artificially "stitched" together "contig",
have been "exploded" such that analyses against individual fragments are now
possible. Once a similarity hit is obtained against a BAC fragment, DoubleTWist.com
performs extensive analysis of the entire "contig" associated with the similarity hit,
thus providing maximum contextual information about the query sequence. DoubleTwist
has also developed computational tools to infer BAC contig fragment ordering in
chromosomes to provide a context and framework for analysis results. Using a sophisticated JAVA-based data mining tool, the user is able to visualize sequence
similarities to known cDNAs, proteins and protein motifs, inferred gene structures, as well as an alignment of the user's query sequence against the genomic
contig. In order to leverage the large amount of daily updated HGP sequence data, subscribers to http://www.DoubleTwist.com are provided with a mechanism to monitor
their interested gene profiles across the human genome when relevant genomic contig has been released or updated by a newer version.