September 18, 2002
張傳雄
Abstract
The ever growing number of completely
sequenced prokaryotic genomes facilitates cross-species comparisons
by genomic annotation algorithms. This paper introduces a new
probabilistic framework for comparative genomic analysis and
demonstrates its utility in the context of improving the accuracy of
prokaryotic gene start site detection. Our frame work employs a
product hidden Markov model (PROD-HMM) with state architecture to
model the species-specific trinucleotide frequency patterns in
sequences immediately upstream and downstream of a translation start
site and to detect the contrasting non-synonymous (amino acid
changing) and synonymous (silent) substitution rates that
differentiate prokaryotic coding from intergenic regions. Depending
on the intricacy of the features modeled by the hidden state
architecture, intergenic, regulatory, promoter and coding regions
can be delimited by this method. The new system is evaluated using a
preliminary set of orthologous Pyrococcus gene pairs, for
which it demonstrates an improved accuracy of detection. Its
robustness is confirmed by analysis with cross-validation of an
experimentally verified set of Escherichia coli K-12 and Salmonella
thyphimurium LT2 orthologs. The novel architecture has a number
of attractive features that distinguish it from previous comparative
models such as pair-HMMs.
References:
1. Baytaluk MV, Gelfand MS, Mironov AA. (2002) Exact mapping of prokaryotic gene starts. Briefings in Bioinformatics 3(2):181-194.
2. Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL. (2001) A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics 17(12):1123-1130.
3. Besemer J, Lomsadze A, Borodovsky M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acid Research 29(12):2607-2618.
4. Yada T, Totoki Y, Takagi T, Nakai K. (2001) A novel bacterial gene-finding system with improved accuracy in locating start codons. DNA Research 8(3):97-106.