PALS db
Aug 15, 2001 PALS db Release 2
25,577 human putative alternative splicing site pairs,
9,214 mouse putative alternative splicing site pairs
¡@
This document describes the format and content in the release 2 of the PUTATIVE ALTERNATIVE SPLICING SITE DATABASE (PALS db).
If you have any questions or comments about PALS db or this document, please contact the Bioinformatics Research Center of Yang-Ming University (YMBC) via email at binfo@ym.edu.tw or
Bioinformatics Research Center, National Yang-Ming University,
No. 155, Sec. 2, Li-Noun St, Taipei, Taiwan 11221, R.O.C.
Phone: +886-2-2826-7128
Fax: +886-2-2826-4843
==========================================================================
TABLE OF CONTENTS
==========================================================================
1. INTRODUCTION
1.1 Release 2
1.2 Important Changes in Release 2
1.3 Audience
1.4 Rationale
1.5 Data Sources
1.6 Criteria in Determining Reliable AS sites
2. RESULTS
2.1 Human
2.2 Mouse
2.3 Format of text output
2.4 Abbreviations
3. PALSDB ADMINISTRATION
3.1 Citing PALS db
3.2 Other Methods of Accessing PALS db data
3.3 Known Bugs
3.4 Deposition of Experimental Data and Comments
3.5 Credits and Acknowledgments
3.6 Disclaimer
==========================================================================
1. INTRODUCTION
1.1 Release 2
The Bioinformatics Research Center at the National Yang-Ming University is responsible for the producing and distributing of the putative alternative splicing site database (PALS db). Using all available human and mouse mRNA sequences as reference sequences, we tried to collect all available putative alternative splicing information hidden in biological sequence databases.
1.2 Important Changes in Release 2
| Mouse genes (16,615 unique genes) were incorporated in this database release. | |||||||||||||||
| More unique human genes (19,936 unique genes in release 2 and 17,595 unique genes in release 1) were now included in this release. | |||||||||||||||
The web interface was improved to 0.9.5
|
1.3 Audience
| PALS db is NOT designed to collect precise splicing site information for designing new gene prediction programs. This database is designed for biologists, who are interested in discovering biological phenomenon and solving a biological problem. Determining an AS site experimentally is time-consuming; however, by looking up putative AS information, biologists can design simple experiments to prove the existence and to perform functional assays of these AS sites. | |
| Though the splicing junctions in PALS db is not as precise as required in designing new gene-prediction programs, biologists may find it useful because PALS db provided piles of putative AS information. Thus people in wet labs can use those putative sites as hints to make hypothesis. For example, in the case of FOS. | |
| In order to get reliable statistics about AS in human genes, we collected AS-related statistics using strict criteria. However, in order to provide biologists more chances to find novel phenomenon, we preserved all the putative information. We also created a user-friendly interface for biologists to judge the validities of the putative information by their expertise. Hopefully, PALS db can be a tool to energize the information-driven biomedical researches. |
1.4 Rationale
In constructing
PALS db, we tried to find all available AS
information. As half of the human genomic sequences were still draft, we chose mRNA
sequences as references to collect AS information from UniGene and dbEST. If there is
AS information in the sequence, with either a deleted or inserted fragment, the alignments
are separated. According to the relations between these alignments, there are 3 major
types of alternative splicing transcripts. '+' means the region that can be aligned. '='
means the regions that had no corresponding fragments to the other sequences. Both the
forward and backard slashes are to indicate the alignable regions between the reference
sequences and putative AS-containing sequences. The identities of both ends of alignments
are defined as ID1 and ID2 respectively. The lengths of the two ends of alignments are
defined as Len1 and Len2 respectively.
¡@
|
1.5 Data Sources
Human UniGene Build #138, Mouse UniGene Build #93, and
dbEST released on Aug 12, 2001
|
| Homologous human and mouse genes from NCBI LocusLink (Aug 12, 2001) | |
| Similar human and mouse genes from NCBI HomoloGene (Aug 03, 2001) | |
| 9,386 literature aliases collected by the Human Genome Organisation (HUGO) (Sept 10, 2001) | |
| EST library information from the Cancer Genome Anatomy Project (CGAP) (Aug 12, 2001) |
1.6 Criteria in Determining Reliable AS sites
In order to preserve information, we have employed a
two-step filter, marking EST sources and adopting moderately high criteria, to manage the
issues of paralogous genes and repeats respectively.
|
2 RESULTS
2.1 Putative alternative splicing in human
| 79,609 sequences (mRNA, EST sequences) contained putative alternative splicing information | |
| 25,577 alternative splicing site pairs |
2.2 Putative alternative splicing in mouse
| 23,768 sequences (mRNA, EST sequences) contained putative alternative splicing information | |
| 9,214 alternative splicing site pairs |
2.3 Format of the "TEXT" output
The standard output
|
2.4 Abbreviations
The following abbreviations are the naming conventions in PALS db. They may be appeared in the web interface, the TEXT output, and the web based HELP system.
| AS | alternative splicing |
| ASSPs | non-redundant AS information for a gene |
| Gb_id | the GenBank accession number of a DNA sequence. (either refseq, mRNA, or EST sequences) |
| Ug_id | the UniGene cluster ID of a unique gene |
| Gene | the gene symbol approved by HUGO |
| UniGene member | the number of sequences clustered in a UniGene Cluster |
| AS lists (pic) | clustered (non redundant) putative AS site pairs displayed in graphics |
| Text_Info | clustered putative AS site pairs in details, including number of cds and EST, the cloning library characteristics of EST, the AS types, the length of a AS fragment, etc. |
| All seq info | all candidate sequences containing putative AS displayed in graphics |
| Descriptions | the gene name approved by HUGO |
| Cytoband | the cytogenetic location of a gene |
3. PALSDB ADMINISTRATION
3.1 Citing PALS db
| If you have used PALS db in your research, we would appreciate it if you would include a reference to PALS db in all publications related to that research. | |
| When citing data in PALS db, it is appropriate to give the database release number, and accession numbers of sequences containing putative AS information. If necessary, we will try to retrieve the information from the past PALS db release used in your publications. | |
| The following publication, which describes the
PALS db,
should be cited: Huang, Y-H, Chen, Y-T, Lai, J-J, Yang, S-T., and Yang, U-C. (2002) PALS db: Putative alternative splicing database. Nucleic Acids Res. 30, 186-190. |
3.2 Other Methods of Accessing PALS db data
| We are now trying to contact institutes for creating mirror sites outside the Bioinformatics Research Center, National Yang-Ming University, Taipei, Taiwan. The first mirror site will soon be available at the National Center for High-performance Computing (NCHC), Hsinchu, Taiwan. For becoming mirror sites, please contact binfo@ym.edu.tw. | |
| We plan to issue flat file distributions for installing to the SRS system in the future. |
3.3 Known Bugs
In clustering candidate sequences containing AS information
into AS site pairs, we mistakenly combined type two AS candidates with same POS1 but with
different variant lengths.
|
3.4 Deposition of Experimental Data and Comments
| Any experimental data that can provide further proof on the putative AS site pair collected in PALS db are welcomed to deposit into PALS db. Please contact binfo@ym.edu.tw. | |
| Any comments should be directly sent to binfo@ym.edu.tw. |
3.5 Credits and Acknowledgments
Credits
|
|||||||
Acknowledge
|
3.6 Disclaimer
The Bioinformatics Research Center of National Yang-Ming University makes no representation about the suitability or accuracy of the PALS db and the web interface for any purposes and makes no warranties, either express or imply, including merchantability and fitness for a particular purpose or that the use of this software of data will not infringe any third party patents, copyrights, trademarks, or other rights.
This web interface and data are provided to enhance knowledge and encourage progress in the scientific community and are to be used only for research and educational purposes. Any reproduction or use for commercial prupose is prohibited without the prior express written permission of the Bioinformatics Research Center of National Yang-Ming University.
For additional information about PALS db releases, please contact YMBC by e-mail at binfo@ym.edu.tw, by phone at +886-2-2826-7128, or by mail at:
Bioinformatics Research Center, National Yang-Ming University,
No. 155, Sec. 2, Li-Noun St, Taipei, Taiwan 11221, R.O.C.
FAX: +886-2-2826-4843