PALS db
Jan 24 2005 PALS db Release 7
50,075 human putative alternative splicing site pairs,
27,905 mouse putative alternative splicing site pairs,
3,642 worm putative alternative splicing site pairs
¡@
This document describes the format and content in the release 7 of the PUTATIVE ALTERNATIVE SPLICING SITE DATABASE (PALS db).
If you have any questions or comments about PALS db or this document, please contact the Bioinformatics Research Center of Yang-Ming University (YMBC) via email at binfo@ym.edu.tw or
Bioinformatics Research Center, National Yang-Ming University,
No. 155, Sec. 2, Li-Noun St, Taipei, Taiwan 11221, R.O.C.
Phone: +886-2-2826-7128
Fax: +886-2-2826-4843
==========================================================================
TABLE OF CONTENTS
==========================================================================
1. INTRODUCTION
1.1 Release 7
1.2 Important Changes in Release 7
1.3 Audience
1.4 Rationale
1.5 Data Sources
1.6 Criteria in Determining Reliable AS sites
2. RESULTS
2.1 Human
2.2 Mouse
2.3 Format of text output
2.4 Abbreviations
3. PALSDB ADMINISTRATION
3.1 Citing PALS db
3.2 Other Methods of Accessing PALS db data
3.3 Known Bugs
3.4 Deposition of Experimental Data and Comments
3.5 Credits and Acknowledgments
3.6 Disclaimer
==========================================================================
1. INTRODUCTION
1.1 Release 7
The Bioinformatics Research Center at the National Yang-Ming University is responsible for the producing and distributing of the putative alternative splicing site database (PALS db). Using all available human and mouse mRNA sequences as reference sequences, we tried to collect all available putative alternative splicing information hidden in biological sequence databases.
1.2 Important Changes in Release 7
| Less unique human genes (26,324 unique genes in release 7 and 33,111 unique genes in release 6) were now included in this release. | |
| Less unique mouse genes (18,614 unique genes in release 7 and 18,942 unique genes in release 6) were now included in this release. | |
| PALSdb adds C. elegans genes since release 7 (14,393 unique genes). |
1.3 Audience
| PALS db is NOT designed to collect precise splicing site information for designing new gene prediction programs. This database is designed for biologists, who are interested in discovering biological phenomenon and solving a biological problem. Determining an AS site experimentally is time-consuming; however, by looking up putative AS information, biologists can design simple experiments to prove the existence and to perform functional assays of these AS sites. | |
| Though the splicing junctions in PALS db is not as precise as required in designing new gene-prediction programs, biologists may find it useful because PALS db provided piles of putative AS information. Thus people in wet labs can use those putative sites as hints to make hypothesis. For example, | |
| In order to get reliable statistics about AS in human genes, we collected AS-related statistics using strict criteria. However, in order to provide biologists more chances to find novel phenomenon, we preserved all the putative information. We also created a user-friendly interface for biologists to judge the validities of the putative information by their expertise. Hopefully, PALS db can be a tool to energize the information-driven biomedical researches. |
1.4 Rationale
In constructing PALS db, we tried to find all available AS
information. As half of the human genomic sequences were still draft, we chose mRNA
sequences as references to collect AS information from UniGene and dbEST. If there is
AS information in the sequence, with either a deleted or inserted fragment, the alignments
are separated. According to the relations between these alignments, there are 3 major
types of alternative splicing transcripts. '+' means the region that can be aligned. '='
means the regions that had no corresponding fragments to the other sequences. Both the
forward and backard slashes are to indicate the alignable regions between the reference
sequences and putative AS-containing sequences. The identities of both ends of alignments
are defined as ID1 and ID2 respectively. The lengths of the two ends of alignments are
defined as Len1 and Len2 respectively.
¡@
|
1.5 Data Sources
Human UniGene Build #176, Mouse UniGene Build
#141,
and dbEST released on Nov. 24, 2004
|
| Homologous human and mouse genes from NCBI LocusLink (Nov 25, 2004) | |
| Similar human and mouse genes from NCBI HomoloGene (Nov 25, 2004) | |
| 22,326 literature aliases collected by the Human Genome Organisation (HUGO) (Nov 25, 2004) | |
| EST library information from the Cancer Genome Anatomy Project (CGAP) (Nov 25, 2004) |
1.6 Criteria in Determining Reliable AS sites
In order to preserve information, we have employed a
two-step filter, marking EST sources and adopting moderately high criteria, to manage the
issues of paralogous genes and repeats respectively.
|
2 RESULTS
2.1 Putative alternative splicing in human
| 374,923 sequences (mRNA, EST sequences) contained putative alternative splicing information | |
| 50,075 alternative splicing site pairs |
2.2 Putative alternative splicing in mouse
| 118,640 sequences (mRNA, EST sequences) contained putative alternative splicing information | |
| 27,905 alternative splicing site pairs |
2.3 Putative alternative splicing in C. elegans
| 6,977 sequences (mRNA, EST sequences) contained putative alternative splicing information | |
| 3,642 alternative splicing site pairs |
2.3 Format of the "TEXT" output
The standard output
|
2.4 Abbreviations
The following abbreviations are the naming conventions in PALS db. They may be appeared in the web interface, the TEXT output, and the web based HELP system.
| AS | alternative splicing |
| ASSPs | non-redundant AS information for a gene |
| Gb_id | the GenBank accession number of a DNA sequence. (either refseq, mRNA, or EST sequences) |
| Ug_id | the UniGene cluster ID of a unique gene |
| Gene | the gene symbol approved by HUGO |
| UniGene member | the number of sequences clustered in a UniGene Cluster |
| AS lists (pic) | clustered (non redundant) putative AS site pairs displayed in graphics |
| Text_Info | clustered putative AS site pairs in details, including number of cds and EST, the cloning library characteristics of EST, the AS types, the length of a AS fragment, etc. |
| All seq info | all candidate sequences containing putative AS displayed in graphics |
| Descriptions | the gene name approved by HUGO |
| Cytoband | the cytogenetic location of a gene |
3. PALSDB ADMINISTRATION
3.1 Citing PALS db
| If you have used PALS db in your research, we would appreciate it if you would include a reference to PALS db in all publications related to that research. | |
| When citing data in PALS db, it is appropriate to give the database release number, and accession numbers of sequences containing putative AS information. If necessary, we will try to retrieve the information from the past PALS db release used in your publications. | |
| The following publication, which describes the PALS db,
should be cited: Huang, Y-H, Chen, Y-T, Lai, J-J, Yang, S-T., and Yang, U-C. (2002) PALS db: Putative alternative splicing database. Nucleic Acids Res. 30, 186-190. |
3.2 Other Methods of Accessing PALS db data
| We are now trying to contact institutes for creating mirror sites outside the Bioinformatics Research Center, National Yang-Ming University, Taipei, Taiwan. For becoming mirror sites, please contact binfo@ym.edu.tw. | |
| We plan to issue flat file distributions for installing to SRS in the future. |
3.3 Known Bugs
In clustering candidate sequences containing AS information
into AS site pairs, we mistakenly combined type two AS candidates with same POS1 but with
different variant lengths.
|
3.4 Deposition of Experimental Data and Comments
| Any experimental data that can provide further proof on the putative AS site pair collected in PALS db are welcomed to deposit into PALS db. Please contact binfo@ym.edu.tw. | |
| Any comments should be directly sent to 39103016@ym.edu.tw. |
3.5 Credits and Acknowledgments
Credits
|
|||||||||
Acknowledgement
|
3.6 Disclaimer
The Bioinformatics Research Center of National Yang-Ming University makes no representation about the suitability or accuracy of the PALS db and the web interface for any purposes and makes no warranties, either express or imply, including merchantability and fitness for a particular purpose or that the use of this software of data will not infringe any third party patents, copyrights, trademarks, or other rights.
This web interface and data are provided to enhance knowledge and encourage progress in the scientific community and are to be used only for research and educational purposes. Any reproduction or use for commercial prupose is prohibited without the prior express written permission of the Bioinformatics Research Center of National Yang-Ming University.
For additional information about PALS db releases, please contact YMBC by e-mail at binfo@ym.edu.tw, by phone at +886-2-2826-7128, or by mail at:
Bioinformatics Research Center, National Yang-Ming University,
No. 155, Sec. 2, Li-Noun St, Taipei, Taiwan 11221, R.O.C.
FAX: +886-2-2826-4843