GCG程式: Map

應用範圍


Map: 尋找酵素切割位置 (go to top)

尋找出所有辨識六個核酸以上的限制酵素及其切割位置

同樣利用程式 map 以尋找限制酵素及其切割位置,此時並可進一步輸入特定參數以符合所需;例如:設定 -sixbase 則將只尋找辨識六個核酸以上的限制酵素。關於參數之設定及使用說明,可在程式 map 後加輸 -check 即會列出並簡要說明,如下所示:

 [ sun670 ] /userdata/manager/project/e %map -check
Map displays both strands of a DNA sequence with restriction sites shown
above the sequence and possible protein translations shown below.

Minimal Syntax: % map [-INfile=]gamma.seq -Default

Prompted Parameters:

-BEGin=2101 -END=2600       range of sequence in which to look for sites
-ENZymes=*[,...]            chooses the enzymes used in the search
-MENu=s                     translation frames s=six, t=three, o=open
[-OUTfile=]gamma.map        output file name

Local Data Files:

-DATa=enzyme.dat          restriction enzyme names and recognition sites
-DATa=proenzyme.dat       peptidases and peptide cleavage reagents
-TRANSlate=translate.txt  the genetic code

Optional Parameters:

-WIDth=100      sets display width to something other than 60 bp-line
-PAGe[=64]      adds form-feeds to keep clusters on a single page
-OPEn[=20]      translates only in open reading frames [minimum ORF length]
 Press q to quit or space for more:
-SIXbase        only finds enzymes with 6 or more bases in recognition site
-ONCe           shows only enzymes that cut once
-MINCuts=2      shows only enzymes that cut at least 2 times
-MAXCuts=2      shows only enzymes that cut no more than 2 times
-EXCLude=n1,n2  doesn't show enzymes that cut between bases n1 and n2
-ALL            finds "overlapping-set" matches
-PERFect        finds only perfect symbol matches between site and sequence
-CIRcular       treats the sequence as circular
-LINear         treats the sequence as linear (default)
-APPend         appends enzyme and genetic code data files to output
-THReeletter    uses three-letter amino acid codes for the translation
-SILent         finds translationally silent potential restriction sites
-MISmatch=1     finds restriction sites with one or fewer mismatches
-NOSEQline      suppresses the sequence display
-NOSCALeline    suppresses the scale line
-NOCOMPline     suppresses the complement sequence display

 Add what to the command line ?  -six -cir
說明:可在此處輸入須要設定的參數
(Circular) (Six Base) MAP of what sequence ?  pLT.seq

                 Begin (* 1 *) ?1
               End (*  3715 *) ?3715

Select the enzymes:  Type nothing or "*" to get all enzymes. Type "?"
for help on which enzymes are available and how to select them.

                                      Enzyme(* * *):

What protein translations do you want:

     a) frame 1   b) frame 2   c) frame 3
     d) frame 4   e) frame 5   f) frame 6
     t)hree forward frames     s)ix frames   o)pen frames only

     n)o protein translation   q)uit

 Please select (capitalize for 3-letter) (* t *):  n

 What should I call the output file (* pLT.map *) ?  pLT.map

以下所顯示的是部分 pLT.map 檔案內的紀錄,此處我們用 EcoRI BamHI 去確認剪接後的新序列(pLT.seq) 的正確性

                  B                                 P              
                  s                           *     f              
                  p                           *     l              
                 B1BB A                       E     1              
              B Aa2ss a     NS   X     AX    Ac     1       B      
              m pn8ae t     sp   b     vh    po     0       s      
              g aI6HR I     ph   a     ao    oR     8       m      
              I IIIII I     II   I     II    II     I       I      
                 //          /          /     /                    
   gggcgaattgggcccgacgtcgcatgctcctctagactcgaggaattctacgaatgctat    
 1 ---------+---------+---------+---------+---------+---------+ 60 ........... 
   cccgcttaacccgggctgcagcgtacgaggagatctgagctccttaagatgcttacgata  

                                   B                                     
                    *             Bs                                     
                    *             sp  A                                  
        BBP         BB BB        Bi1  f   B                              
        ssiB   AKS  as ssM       aH2S lM  s        N             S       
        arna   vpm  mt apm       nK8a Il  t        s             f       
        WFAn   ana  HY WEe       IA6c Iu  X        i             c       
        IIII   III  II III       IIII II  I        I             I       
         //     /    /  //        ///  /                                 
        accggtacccggggatccggagagctcccaacgcgttggatgcatagcttgagtattcta     
... 781 ---------+---------+---------+---------+---------+---------+ 840 ..............
        tggccatgggcccctaggcctctcgagggttgcgcaacctacgtatcgaactcataagat

[ 運算法 | 參數設定 | 結果分析 | 程式類別 | 個案分析 | 標準分析 ]


Map: 尋找開放讀架 (go to top)

Map 程式的目的是在 DNA 序列上標出限制酵素切割的位置。為了設計實驗時方便起見,亦提供轉譯的功能,甚至能自動尋找大於給定長度的開放讀架。在範例 5-4 中,是利用這個功能來找開放讀架的精確起始與結束位置。因為練習 5-3 找到的開放讀架約有 300 個胺基酸,所以在指令行附加「-open=200」會只顯示長於 200 個胺基酸的開放讀架。為了避免程式列出酵素切割位置,所以故意在Enzyme (* * *):處鍵入一個空格 (space),以跳離此問題,直接進入轉譯的選項。

範例 請找出 tfiiia.dna 中最可能的開放讀架之起始密碼 (AUG) 與結束密碼的位置。

%map -open=200
Map maps a DNA sequence and displays both strands of the mapped sequence
with restriction enzyme cut points above the sequence and protein
translations below. Map can also create a peptide map of an amino acid
sequence.

(Linear) MAP of what sequence ? tfiiia.dna
Begin (* 1 *) ?
End (* 1518 *) ?
Select the enzymes: Type nothing or "*" to get all enzymes. Type "?"
for help on which enzymes are available and how to select them.
Enzyme(* * *):
Enzyme:

What protein translations do you want:
a) frame 1 b) frame 2 c) frame 3
d) frame 4 e) frame 5 f) frame 6
t)hree forward frames s)ix frames o)pen frames only
n)o protein translation q)uit

Please select (capitalize for 3-letter) (* t *): o

What should I call the output file (* tfiiia.map *) ?

Mapping .

下圖中顯示 Map 程式的部份輸出結果,因為最可能的開放讀架是正向第三個讀架(即讀架 c ),所以不必再看其他的讀架。值得注意的是雖然第一個密碼不是 AUG,可是開放讀架卻由此處開始,這是為了避免序列不全時設計的。對 TFIIIA 而言,我們要找的是序列的第一個 AUG,由第 42 個鹼基對開始。在第 1080 個鹼基對附近,即在第 1074 這位置確實可找到結束碼。

3 利用 Map 程式尋找開放讀架的精確位置

(Linear) MAP of: tfiiia.dna check: 6524 from: 1 to: 1518

LOCUS XELTFIIIA 1518 bp mRNA VRT 20-MAR-1986
DEFINITION X.laevis 5S RNA gene transcription factor (TFIIIA) mRNA,

-----

MinOpen: 200

February 3, 1997 07:31 ..
GAATTCCGGAAGCCGAGGGCTGTTCAGTTGCTGAAGGAGAGATGGGAGAGAAGGCGCTGC
1 ---------+---------+---------+---------+---------+---------+ 60
CTTAAGGCCTTCGGCTCCCGACAAGTCAACGACTTCCTCTCTACCCTCTCTTCCGCGACG
a -
b -
c I P E A E G C S V A E G E M G E K A L P -
1 ---------+---------+---------+---------+---------+---------+ 60
d F E P L R P S N L Q Q L L S P L S P A A -
e -
f I G S A S P Q E T A S P S I P S F A S -

-----

CCTCTGGCACTGAAACAAATGGCTCATTGGTTCTAGATAAATTAACTATACAATAATATA
1021 ---------+---------+---------+---------+---------+---------+ 1080
GGAGACCGTGACTTTGTTTACCGAGTAACCAAGATCTATTTAATTGATATGTTATTATAT
a -
b -
c S G T E T N G S L V L D K L T I Q * -
1021 ---------+---------+---------+---------+---------+---------+ 1080
d -
e -
f -

[ 運算法 | 參數設定 | 結果分析 | 程式類別 | 個案分析 | 標準分析 ]


Map: 尋找轉錄因子接合位置 (基因上游序列的分析) (go to top)

Summary

  1. Purpose: Search for known nucleic acid sites in a given sequence
  2. Application: Look for transcription factor binding sites in the upstream sequences
  3. Algorithm: Pattern search
  4. Data file required: tfsites.dat (694 Kbytes)
  5. Tricks in using this program

一旦決定了 mRNA 的起點,即可分析基因上游序列的性質,通常需要知道有哪一些可能的轉錄因子接合位置。這類分析其實與尋找限制酵素切割位置很像,只需要將限制酵素的辨識位置改變為轉譯因子的接合位置,即可利用找酵素切割位置的 MapMapSort MapPlot 等程式,來找一個序列中所有的轉錄因子接合位置。FindPatterns 程式則可像找尋資料庫中的模組樣式那樣,找出有哪些序列含有特定之轉錄因子接合位置。Ghosh 博士由文獻中收集了許多轉錄因子的接合位置,建立了 TFD (Transcription Factor Database)。如果要找給定序列上的接合位置,可在使用上述程式時,將 TFD 當成自用數據檔。

範例2 請估計 TFIIIA 基因上游序列中的 TATA 盒與 CAAT 盒,相對於 mRNA 起點的位置。

  1. 取回 TFD
  2. % fetch tfsites.dat enzyme.dat
  3. 利用 Map 程式找尋轉錄因子接合位置
  4. % map ov:xltf3a5
  5. 在圖 1-3中,第 426 個核甘酸的位置是「1」,第 425 個核甘酸則為「-1」,所以 TATA 盒約出現在-30,而 CAAT 盒 則出現在-100左右。

做此題之重要技巧是將取回的 tfsites.dat 重新命名為 enzyme.dat ,因為在用 Map 程式時,程式只接受預設的enzyme.dat,而不接受「-dat=tfsites.dat」。

3 TFIIIA 基因上游序列的轉錄因子接合位置分析

(Linear) MAP of: gb_ov:xltf3a5 check: 238 from: 1 to: 515
LOCUS XLTF3A5 515 bp DNA VRT 09-APR-1991
DEFINITION Xenopus TFIIIA gene 5' region.
ACCESSION X15785
-----
Using Enzyme data from: enzyme.dat FileCheck: 2714
-----
MaxCuts: 1
.....
DHFR-undefined-site-1
CTF/CBP-hs|
hsp70.5|
CCAAT_site_4|
NFI.2 ||
GR-intron-site-3 CAAT_site(1) ||
GR-intron-site-4 | IE1.2 | ||
| | | | ||
AGCTGCAAGGGACACAGGAAAGGGCTGATTGCCAATCCTTTCAGACATCGCAAAACTTCC
301 ---------+---------+---------+---------+---------+---------+ 360
TCGACGTTCCCTGTGTCCTTTCCCGACTAACGGTTAGGAAAGTCTGTAGCGTTTTGAAGG
GR-intron-site-2
TATA-box.2 |
his3-Tr-TATA |
Ad2MLP_US.3 |
(TFIID/TBF)-RS |
GAL1-TATA| |
TATA-box-CS|| |
GR-MT-IIA ||| |
D3 c-mos_DS1 | TATA|| |
| | | ||| |
CGATGCATGTGCGATAATGGTTTGTCCTAGAGCTATATAAACAGGCACACATGGCGGCTA
361 ---------+---------+---------+---------+---------+---------+ 420
GCTACGTACACGCTATTACCAAACAGGATCTCGATATATTTGTCCGTGTGTACCGCCGAT
PEA3_RS
Ets-1_CS|
TCF-2-alpha_CS|
CP2-gamma-FBG || PTF1-beta-consensus
| || |
CAGTGGCTTCTACAAGTTCAGAGGAAGCCGAGGGCAGCTTAGTTACTGAAGGAGAGATGG
421 -----*---+---------+---------+---------+---------+---------+ 480
GTCACCGAAGATGTTCAAGTCTCCTTCGGCTCCCGTCGAATCAATGACTTCCTCTCTACC
B1
|
GAGAGAAGGCGCTGCCGGTGGTGTATAAGCGGTAC
481 ---------+---------+---------+----- 515
CTCTCTTCCGCGACGGCCACCACATATTCGCCATG
Enzymes that do cut and were not excluded:
-----
Enzymes that do not cut:
-----

 

[ 運算法 | 參數設定 | 結果分析 | 程式類別 | 個案分析 | 標準分析 ]

Last updated on 11/23/01