Putative ALternative Splicing Database
Classification of Alternative Splicing in PALS db
There are three major types of mRNA-EST alignments. The relation of aligned EST sequences to reference sequences (mRNAs) can be classified by the locations of unaligned region. Type I AS sites were defined as those EST sequences missed part of the sequences in the references. In other words, the unaligned region was on the reference sequence. If the unaligned sequence was on the EST sequences, we defined them as type II sites. In this case, an intron sequence on the EST sequence was absent on the reference sequence. In type III, both the reference and the target sequences contained fragments that were not homologous with each other. Though type III might result from real alternative splicing event, this type could also be derived from aligning low-quality EST sequences with reference sequence. Other types of alignments were also possible; however, they were likely to be artifact, so they were not listed here. In the current version of PALS db, only types I and II splicing sites were included to ensure the data quality.
Cluster AS Candidates into AS Site Pairs (ASSP) to Reduce Redundancy
Several alternative-splicing candidates might originate from a single transcript. The 8th to 9th sequences can be evidences of a type II AS transcript at the 1215th nt on the reference sequence with an additional 18 bases fragment. Clustering of candidates is necessary for reducing the redundancy to putative AS sites. The strategy is that those candidates with identical cutting positions on alignments to reference sequences are grouped together into an alternative splicing site pair (ASSP).
False positives in PALS db might come from several sources, such as paralogous genes, pseudogenes, and repeats that could not removed in commonly used masking programs. Of course, strict criteria for the present identities of alignments might be useful to exclude some of these biases (Brett et al., 2000); however, it was almost impossible to get rid of them completely. In fact, some information could be eliminated when we applied these tough criteria to filter out low-quality AS candidates. The loss of information may be particularly severe when the AS site is located near the 3íŽend of ESTs. In order to preserve information, we have employed a two-step filter, marking EST sources and adopting moderately high criteria, to manage the issues of paralogous genes and repeats respectively.
At the stage of computing statistics, we have used 95% identity in a 50-bp fragment on both ends of separated alignments as the criteria to predict splicing sites. This criteria is less strict, but more realistic than that described by Brett et al., 2000. For Hs.168812, there were more than 20 similar repeats in the reference sequence AK024194. EST BE327138 could be aligned to 24 separated regions of the reference sequence using NCBI bl2seq. However, the SIM4 output revealed there might be several AS events possible in BE327138.Our criteria is stringent enough to filter out the interference from these repeats. Although there might be real AS events in repeats, we think that further manual adjustment of alignments, in addition to large-scale automation, is required to confirm a putative AS event in sequences containing repeats.
(Generated by using NCBI Blast 2 sequences, all the default parameters are not modified.)
Brett, D., Hanke, J., Lehmann, G., Haase, S., Delbruck, S., Krueger, S., Reich, J., Bork, P. (2000) EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS 474, 83-86.
Last updated on 2003-11-01 by Fu, Gloria Chiung-Ling