尋找出所有辨識六個核酸以上的限制酵素及其切割位置
同樣利用程式 map 以尋找限制酵素及其切割位置,此時並可進一步輸入特定參數以符合所需;例如:設定 -sixbase 則將只尋找辨識六個核酸以上的限制酵素。關於參數之設定及使用說明,可在程式 map 後加輸 -check 即會列出並簡要說明,如下所示:
[ sun670 ] /userdata/manager/project/e %map -check Map displays both strands of a DNA sequence with restriction sites shown above the sequence and possible protein translations shown below. Minimal Syntax: % map [-INfile=]gamma.seq -Default Prompted Parameters: -BEGin=2101 -END=2600 range of sequence in which to look for sites -ENZymes=*[,...] chooses the enzymes used in the search -MENu=s translation frames s=six, t=three, o=open [-OUTfile=]gamma.map output file name Local Data Files: -DATa=enzyme.dat restriction enzyme names and recognition sites -DATa=proenzyme.dat peptidases and peptide cleavage reagents -TRANSlate=translate.txt the genetic code Optional Parameters: -WIDth=100 sets display width to something other than 60 bp-line -PAGe[=64] adds form-feeds to keep clusters on a single page -OPEn[=20] translates only in open reading frames [minimum ORF length] Press q to quit or space for more: -SIXbase only finds enzymes with 6 or more bases in recognition site -ONCe shows only enzymes that cut once -MINCuts=2 shows only enzymes that cut at least 2 times -MAXCuts=2 shows only enzymes that cut no more than 2 times -EXCLude=n1,n2 doesn't show enzymes that cut between bases n1 and n2 -ALL finds "overlapping-set" matches -PERFect finds only perfect symbol matches between site and sequence -CIRcular treats the sequence as circular -LINear treats the sequence as linear (default) -APPend appends enzyme and genetic code data files to output -THReeletter uses three-letter amino acid codes for the translation -SILent finds translationally silent potential restriction sites -MISmatch=1 finds restriction sites with one or fewer mismatches -NOSEQline suppresses the sequence display -NOSCALeline suppresses the scale line -NOCOMPline suppresses the complement sequence display Add what to the command line ? -six -cir 說明:可在此處輸入須要設定的參數 (Circular) (Six Base) MAP of what sequence ? pLT.seq Begin (* 1 *) ?1 End (* 3715 *) ?3715 Select the enzymes: Type nothing or "*" to get all enzymes. Type "?" for help on which enzymes are available and how to select them. Enzyme(* * *): What protein translations do you want: a) frame 1 b) frame 2 c) frame 3 d) frame 4 e) frame 5 f) frame 6 t)hree forward frames s)ix frames o)pen frames only n)o protein translation q)uit Please select (capitalize for 3-letter) (* t *): n What should I call the output file (* pLT.map *) ? pLT.map |
以下所顯示的是部分 pLT.map 檔案內的紀錄,此處我們用 EcoRI 及 BamHI 去確認剪接後的新序列(pLT.seq) 的正確性
B P s * f p * l B1BB A E 1 B Aa2ss a NS X AX Ac 1 B m pn8ae t sp b vh po 0 s g aI6HR I ph a ao oR 8 m I IIIII I II I II II I I // / / / gggcgaattgggcccgacgtcgcatgctcctctagactcgaggaattctacgaatgctat 1 ---------+---------+---------+---------+---------+---------+ 60 ........... cccgcttaacccgggctgcagcgtacgaggagatctgagctccttaagatgcttacgata B * Bs * sp A BBP BB BB Bi1 f B ssiB AKS as ssM aH2S lM s N S arna vpm mt apm nK8a Il t s f WFAn ana HY WEe IA6c Iu X i c IIII III II III IIII II I I I // / / // /// / accggtacccggggatccggagagctcccaacgcgttggatgcatagcttgagtattcta ... 781 ---------+---------+---------+---------+---------+---------+ 840 .............. tggccatgggcccctaggcctctcgagggttgcgcaacctacgtatcgaactcataagat |
[ 運算法 | 參數設定 | 結果分析 | 程式類別 | 個案分析 | 標準分析 ]
Map 程式的目的是在 DNA 序列上標出限制酵素切割的位置。為了設計實驗時方便起見,亦提供轉譯的功能,甚至能自動尋找大於給定長度的開放讀架。在範例 5-4 中,是利用這個功能來找開放讀架的精確起始與結束位置。因為練習 5-3 找到的開放讀架約有 300 個胺基酸,所以在指令行附加「-open=200」會只顯示長於 200 個胺基酸的開放讀架。為了避免程式列出酵素切割位置,所以故意在「Enzyme (* * *):」處鍵入一個空格 (space),以跳離此問題,直接進入轉譯的選項。
範例 請找出 tfiiia.dna 中最可能的開放讀架之起始密碼 (AUG) 與結束密碼的位置。
%map -open=200 Map maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence. (Linear) MAP of what sequence ? tfiiia.dna Begin (* 1 *) ? End (* 1518 *) ? Select the enzymes: Type nothing or "*" to get all enzymes. Type "?" for help on which enzymes are available and how to select them. Enzyme(* * *): Enzyme: What protein translations do you want: a) frame 1 b) frame 2 c) frame 3 d) frame 4 e) frame 5 f) frame 6 t)hree forward frames s)ix frames o)pen frames only n)o protein translation q)uit Please select (capitalize for 3-letter) (* t *): o What should I call the output file (* tfiiia.map *) ? Mapping . |
下圖中顯示 Map 程式的部份輸出結果,因為最可能的開放讀架是正向第三個讀架(即讀架 c ),所以不必再看其他的讀架。值得注意的是雖然第一個密碼不是 AUG,可是開放讀架卻由此處開始,這是為了避免序列不全時設計的。對 TFIIIA 而言,我們要找的是序列的第一個 AUG,由第 42 個鹼基對開始。在第 1080 個鹼基對附近,即在第 1074 這位置確實可找到結束碼。
圖3 利用 Map 程式尋找開放讀架的精確位置
(Linear) MAP of:
tfiiia.dna check: 6524 from: 1 to: 1518 LOCUS XELTFIIIA 1518 bp mRNA VRT 20-MAR-1986 DEFINITION X.laevis 5S RNA gene transcription factor (TFIIIA) mRNA, ----- MinOpen: 200 February 3, 1997 07:31 .. GAATTCCGGAAGCCGAGGGCTGTTCAGTTGCTGAAGGAGAGATGGGAGAGAAGGCGCTGC 1 ---------+---------+---------+---------+---------+---------+ 60 CTTAAGGCCTTCGGCTCCCGACAAGTCAACGACTTCCTCTCTACCCTCTCTTCCGCGACG a - b - c I P E A E G C S V A E G E M G E K A L P - 1 ---------+---------+---------+---------+---------+---------+ 60 d F E P L R P S N L Q Q L L S P L S P A A - e - f I G S A S P Q E T A S P S I P S F A S - ----- CCTCTGGCACTGAAACAAATGGCTCATTGGTTCTAGATAAATTAACTATACAATAATATA 1021 ---------+---------+---------+---------+---------+---------+ 1080 GGAGACCGTGACTTTGTTTACCGAGTAACCAAGATCTATTTAATTGATATGTTATTATAT a - b - c S G T E T N G S L V L D K L T I Q * - 1021 ---------+---------+---------+---------+---------+---------+ 1080 d - e - f - |
[ 運算法 | 參數設定 | 結果分析 | 程式類別 | 個案分析 | 標準分析 ]
一旦決定了 mRNA 的起點,即可分析基因上游序列的性質,通常需要知道有哪一些可能的轉錄因子接合位置。這類分析其實與尋找限制酵素切割位置很像,只需要將限制酵素的辨識位置改變為轉譯因子的接合位置,即可利用找酵素切割位置的 Map, MapSort 與 MapPlot 等程式,來找一個序列中所有的轉錄因子接合位置。FindPatterns 程式則可像找尋資料庫中的模組樣式那樣,找出有哪些序列含有特定之轉錄因子接合位置。Ghosh 博士由文獻中收集了許多轉錄因子的接合位置,建立了 TFD (Transcription Factor Database)。如果要找給定序列上的接合位置,可在使用上述程式時,將 TFD 當成自用數據檔。
範例2 請估計 TFIIIA 基因上游序列中的 TATA 盒與 CAAT 盒,相對於 mRNA 起點的位置。
做此題之重要技巧是將取回的 tfsites.dat 重新命名為 enzyme.dat ,因為在用 Map 程式時,程式只接受預設的enzyme.dat,而不接受「-dat=tfsites.dat」。
圖3 TFIIIA 基因上游序列的轉錄因子接合位置分析
(Linear) MAP of:
gb_ov:xltf3a5 check: 238 from: 1 to: 515 LOCUS XLTF3A5 515 bp DNA VRT 09-APR-1991 DEFINITION Xenopus TFIIIA gene 5' region. ACCESSION X15785 ----- Using Enzyme data from: enzyme.dat FileCheck: 2714 ----- MaxCuts: 1 ..... DHFR-undefined-site-1 CTF/CBP-hs| hsp70.5| CCAAT_site_4| NFI.2 || GR-intron-site-3 CAAT_site(1) || GR-intron-site-4 | IE1.2 | || | | | | || AGCTGCAAGGGACACAGGAAAGGGCTGATTGCCAATCCTTTCAGACATCGCAAAACTTCC 301 ---------+---------+---------+---------+---------+---------+ 360 TCGACGTTCCCTGTGTCCTTTCCCGACTAACGGTTAGGAAAGTCTGTAGCGTTTTGAAGG GR-intron-site-2 TATA-box.2 | his3-Tr-TATA | Ad2MLP_US.3 | (TFIID/TBF)-RS | GAL1-TATA| | TATA-box-CS|| | GR-MT-IIA ||| | D3 c-mos_DS1 | TATA|| | | | | ||| | CGATGCATGTGCGATAATGGTTTGTCCTAGAGCTATATAAACAGGCACACATGGCGGCTA 361 ---------+---------+---------+---------+---------+---------+ 420 GCTACGTACACGCTATTACCAAACAGGATCTCGATATATTTGTCCGTGTGTACCGCCGAT PEA3_RS Ets-1_CS| TCF-2-alpha_CS| CP2-gamma-FBG || PTF1-beta-consensus | || | CAGTGGCTTCTACAAGTTCAGAGGAAGCCGAGGGCAGCTTAGTTACTGAAGGAGAGATGG 421 -----*---+---------+---------+---------+---------+---------+ 480 GTCACCGAAGATGTTCAAGTCTCCTTCGGCTCCCGTCGAATCAATGACTTCCTCTCTACC B1 | GAGAGAAGGCGCTGCCGGTGGTGTATAAGCGGTAC 481 ---------+---------+---------+----- 515 CTCTCTTCCGCGACGGCCACCACATATTCGCCATG Enzymes that do cut and were not excluded: ----- Enzymes that do not cut: ----- |
[ 運算法 | 參數設定 | 結果分析 | 程式類別 | 個案分析 | 標準分析 ]
Last updated on 11/23/01