![]() |
EMBOSS: remap |
The home page of REBASE is: http://rebase.neb.com/rebase/rebase.html
This program uses REBASE data to find the recognition sites and/or cut sites of restriction enzymes in a nucleic acid sequence.
This program displays the cut sites on both strands by default. It will optionally also display the translation of the sequence.
There are many options to change the style of display to aid in making clear presentations.
One potentially very useful option is '-flatreformat' that displays not only the cut sites which many other restriction cut-site programs will show, but also shows the recognition site.
% remap -notran -sbeg 1 -send 60 Display a sequence with restriction cut sites, translation etc.. Input sequence(s): embl:eclac Output file [eclac.remap]: Comma separated enzyme list [all]: taqi,bsu6i,acii,bsski Minimum recognition site length [4]:
Here is an example where all enzymes in the REBASE database are used:
% remap -notran -sbeg 1 -send 60 Display a sequence with restriction cut sites, translation etc.. Input sequence(s): embl:eclac Output file [eclac.remap]: Comma separated enzyme list [all]: Minimum recognition site length [4]:
Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outfile] outfile If you enter the name of a file here then this program will write the sequence details into that file. -enzymes string The name 'all' reads in all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the names is not important. You can specify a file of enzyme names to read in by giving the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI -sitelen integer Minimum recognition site length Optional qualifiers: -[no]cutlist bool List the enzymes that cut -flatreformat bool Display RE sites in flat format -mincuts integer Minimum cuts per RE -maxcuts integer Maximum cuts per RE -single bool Force single site only cuts -[no]blunt bool Allow blunt end cutters -[no]sticky bool Allow sticky end cutters -[no]ambiguity bool Allow ambiguous matches -plasmid bool Allow circular DNA -[no]commercial bool Only enzymes with suppliers -[no]limit bool Limits reports to one isoschizomer -preferred bool Report preferred isoschizomers -table list Code to use Advanced qualifiers: -[no]translation bool Display translation -[no]reverse bool Display cut sites and translation of reverse sense -orfminsize integer Minimum size of Open Reading Frames (ORFs) to display in the translations. -uppercase range Regions to put in uppercase. If this is left blank, then the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 -highlight range Regions to colour if formatting for HTML. If this is left blank, then the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font colour. Examples of region specifications are: 24-45 blue 56-78 orange 1-100 green 120-156 red A file of ranges to colour (one range per line) can be specifed as '@filename'. -threeletter bool Display protein sequences in three-letter code -number bool Number the sequences -width integer Width of sequence to display -length integer Line length of page (0 for indefinite) -margin integer Margin around sequence for numbering -[no]name bool Set this to be false if you do not wish to display the ID name of the sequence -[no]description bool Set this to be false if you do not wish to display the description of the sequence -offset integer Offset to start numbering the sequence from -html bool Use HTML formatting General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required | ||||||||||||||||||||||||||||||||||||
[-outfile] (Parameter 2) |
If you enter the name of a file here then this program will write the sequence details into that file. | Output file | <sequence>.remap | ||||||||||||||||||||||||||||||||||||
-enzymes | The name 'all' reads in all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the names is not important. You can specify a file of enzyme names to read in by giving the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI | Any string is accepted | all | ||||||||||||||||||||||||||||||||||||
-sitelen | Minimum recognition site length | Integer from 2 to 20 | 4 | ||||||||||||||||||||||||||||||||||||
Optional qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||
-[no]cutlist | List the enzymes that cut | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-flatreformat | Display RE sites in flat format | Yes/No | No | ||||||||||||||||||||||||||||||||||||
-mincuts | Minimum cuts per RE | Integer from 1 to 1000 | 1 | ||||||||||||||||||||||||||||||||||||
-maxcuts | Maximum cuts per RE | Integer up to 2000000000 | 2000000000 | ||||||||||||||||||||||||||||||||||||
-single | Force single site only cuts | Yes/No | No | ||||||||||||||||||||||||||||||||||||
-[no]blunt | Allow blunt end cutters | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-[no]sticky | Allow sticky end cutters | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-[no]ambiguity | Allow ambiguous matches | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-plasmid | Allow circular DNA | Yes/No | No | ||||||||||||||||||||||||||||||||||||
-[no]commercial | Only enzymes with suppliers | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-[no]limit | Limits reports to one isoschizomer | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-preferred | Report preferred isoschizomers | Yes/No | No | ||||||||||||||||||||||||||||||||||||
-table | Code to use |
|
0 | ||||||||||||||||||||||||||||||||||||
Advanced qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||
-[no]translation | Display translation | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-[no]reverse | Display cut sites and translation of reverse sense | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-orfminsize | Minimum size of Open Reading Frames (ORFs) to display in the translations. | Integer 0 or more | 0 | ||||||||||||||||||||||||||||||||||||
-uppercase | Regions to put in uppercase. If this is left blank, then the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 | Sequence range | If this is left blank, then the sequence case is left alone. | ||||||||||||||||||||||||||||||||||||
-highlight | Regions to colour if formatting for HTML. If this is left blank, then the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font colour. Examples of region specifications are: 24-45 blue 56-78 orange 1-100 green 120-156 red A file of ranges to colour (one range per line) can be specifed as '@filename'. | Sequence range | full sequence | ||||||||||||||||||||||||||||||||||||
-threeletter | Display protein sequences in three-letter code | Yes/No | No | ||||||||||||||||||||||||||||||||||||
-number | Number the sequences | Yes/No | No | ||||||||||||||||||||||||||||||||||||
-width | Width of sequence to display | Integer 1 or more | 60 | ||||||||||||||||||||||||||||||||||||
-length | Line length of page (0 for indefinite) | Integer 0 or more | 0 | ||||||||||||||||||||||||||||||||||||
-margin | Margin around sequence for numbering | Integer 0 or more | 10 | ||||||||||||||||||||||||||||||||||||
-[no]name | Set this to be false if you do not wish to display the ID name of the sequence | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-[no]description | Set this to be false if you do not wish to display the description of the sequence | Yes/No | Yes | ||||||||||||||||||||||||||||||||||||
-offset | Offset to start numbering the sequence from | Any integer value | 1 | ||||||||||||||||||||||||||||||||||||
-html | Use HTML formatting | Yes/No | No |
The format of the range file is:
An example range file is:
# this is my set of ranges 12 23 4 5 this is like 12-23, but smaller 67 10348 interesting region
You can specifiy a file of ranges to highlight in a different colour when outputting in HTML format (using the '-html' qualifier) by giving the '-highlight' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-highlight @myfile').
The format of this file is very similar to the format of the above uppercase range file, except that the text after the start and end positions is used as the HTML colour name. This colour name is used 'as is' when specifying the colour in HTML in a '<FONT COLOR=xxx>' construct, (where 'xxx' is the name of the colour).
The standard names of HTML font colours are given in: http://www.iconbazaar.com/color_tables/named_colors.html
An example highlight range file is:
# this is my set of ranges 12 23 red 4 5 darkturquoise 67 10348 #FFE4E1
ECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes. Hsp92II | Hin6I | | Bsu6I | | BssKI TaqI | | AspLEI | Bsc4I | | |Bsp143II | AccB7I | | ||BsiSI | | Hin6I AciI | | ||AsuC2I | | | AspLEI AccII | | ||Bme1390I \ \ \ \ \ \ \ \\\ GACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGT 10 20 30 40 50 60 ----:----|----:----|----:----|----:----|----:----|----:----| CTGTGGTAGCTTACCGCGTTTTGGAAAGCGCCATACCGTACTATCGCGGGCCTTCTCTCA / / / / / / / // / /// | TaqI | Hin6I | AciI | || | ||BssKI AccB7I AspLEI AccII | || | |BsiSI Bsc4I | || | Bme1390I | || | AsuC2I | || | Bsu6I | || Hin6I | |AspLEI | Bsp143II Hsp92II # Enzymes that cut Frequency Isoschizomers AccB7I 1 PflBI,PflMI,Van91I AccII 1 Bsh1236I,BstFNI,BstUI,MvnI,ThaI AciI 1 AspLEI 2 BstHHI,CfoI,HhaI AsuC2I 1 BcnI,NciI Bme1390I 1 MspR9I,ScrFI Bsc4I 1 BseLI,BsiYI,BslI BsiSI 1 HapII,HpaII,MspI Bsp143II 1 BstH2I,HaeII BssKI 1 Bsu6I 1 Eam1104I,EarI,Ksp632I Hin6I 2 HinP1I,HspAI Hsp92II 1 NlaIII TaqI 1 TthHB8I # Enzymes < MINCUTS Frequency Isoschizomers # Enzymes > MAXCUTS Frequency Isoschizomers # Enzymes that do not cut AarI AatI AatII AauI Acc113I Acc16I Acc36I Acc65I AccB1I AccBSI AccI AccIII AclI AclWI AcsI AcyI AdeI AfaI AfeI AflI AflII AflIII AgeI AhdI AhlI AloI AluI Alw21I Alw26I Alw44I AlwI AlwNI Ama87I AocI Aor51HI ApaI ApaLI ApoI AscI AseI AsiAI AsnI Asp700I Asp718I AspEI AspHI AspI AspS9I AsuHPI AsuII AsuNHI AvaI AvaII AviII AvrII AxyI BaeI BalI BamHI BanI BanII BanIII BbeI BbrPI BbsI BbuI Bbv12I BbvCI BbvI BceAI BcgI BciVI BclI BcoI BcuI BfaI BfiI BfmI BfrBI BfrI BfuI BglI BglII BlnI BlpI Bme18I BmrI BmyI BoxI BpiI BplI BpmI Bpu10I Bpu1102I Bpu14I BpuAI Bsa29I BsaAI BsaBI BsaHI BsaI BsaJI BsaMI BsaOI BsaWI BsaXI BscBI BscCI BscFI BscI Bse118I Bse1I Bse21I Bse3DI Bse8I BseAI BseBI BseCI BseDI BseGI BseMI BseMII BseNI BsePI BseRI BseSI BseX3I BseXI BsgI Bsh1285I Bsh1365I BshFI BshI BshNI BshTI BsiBI BsiCI BsiEI BsiHKAI BsiLI BsiMI BsiQI BsiWI BsiXI BsiZI BsmAI BsmBI BsmFI BsmI Bso31I BsoBI BsoMAI Bsp106I Bsp119I Bsp120I Bsp1286I Bsp13I Bsp1407I Bsp143I Bsp1720I Bsp19I Bsp68I BspA2I BspCI BspCNI BspDI BspEI BspHI BspLI BspLU11I BspMI BspPI BspT104I BspT107I BspTI BspXI BsrBI BsrBRI BsrDI BsrFI BsrGI BsrI BsrSI BssAI BssECI BssHI BssHII BssNAI BssSI BssT1I Bst1107I Bst2BI Bst2UI Bst4CI Bst71I Bst98I BstACI BstAPI BstBAI BstBI BstDEI BstDSI BstEII BstENI BstENII BstF5I BstHPI BstMCI BstNI BstNSI BstOI BstPAI BstPI BstSFI BstSNI BstV2I BstX2I BstXI BstYI BstZ17I BstZI Bsu15I Bsu36I BsuRI BtgI BtrI BtsI Cac8I CaiI CciNI CelII Cfr10I Cfr13I Cfr42I Cfr9I CfrI ClaI CpoI Csp45I Csp6I CspAI CspI CviJI CviRI CviTI CvnI DdeI DpnI DpnII DraI DraII DraIII DrdI DsaI DseDI EaeI EagI Eam1105I EciI Ecl136II EclHKI EclXI Eco105I Eco130I Eco147I Eco24I Eco31I Eco32I Eco47I Eco47III Eco52I Eco57I Eco57MI Eco64I Eco72I Eco81I Eco88I Eco91I EcoICRI EcoNI EcoO109I EcoO65I EcoRI EcoRII EcoRV EcoT14I EcoT22I EcoT38I EgeI EheI ErhI Esp3I FauI FauNDI FbaI FblI Fnu4HI FokI FriOI FseI Fsp4HI FspAI FspI FunI FunII GsuI HaeIII HgaI HgiEI Hin1I Hin4I HincII HindII HindIII HinfI HpaI HphI Hpy188I Hpy188III Hpy8I Hpy99I HpyCH4III HpyCH4IV HpyCH4V HpyF44III Hsp92I ItaI KasI Kpn2I KpnI Ksp22I KspAI KspI Kzo9I LspI LweI MabI MaeI MaeII MaeIII MamI MbiI MboI MboII MfeI MflI MhlI MlsI MluI MluNI Mly113I MlyI MnlI Mph1103I MroI MroNI MroXI MscI MseI MslI Msp17I MspA1I MspCI MssI MunI Mva1269I MvaI MwoI NaeI NarI NcoI NdeI NdeII NgoAIV NgoMIV NheI NlaIV NmuCI NotI NruGI NruI NsbI NsiI NspI NspIII NspV OliI PacI PaeI PaeR7I PagI PalI PauI PciI PctI PdiI PdmI Pfl23II PflFI PinAI Ple19I PleI PmaCI Pme55I PmeI PmlI PpiI PpsI Ppu10I PpuMI PpuXI PshAI PshBI PsiI Psp124BI Psp1406I Psp5II PspAI PspEI PspGI PspLI PspN4I PspOMI PspPI PspPPI PsrI PstI PsuI PsyI PvuI PvuII RcaI RsaI Rsr2I RsrII SacI SacII SalI SanDI SapI SatI Sau3AI Sau96I SbfI ScaI SchI SdaI SduI SexAI SfaNI SfcI SfiI SfoI Sfr274I Sfr303I SfuI SgfI SgrAI SgrBI SinI SlaI SmaI SmiI SmiMI SmlI SmuI SnaBI SpaHI SpeI SphI SrfI Sse8387I Sse9I SseBI SspBI SspI SstI SstII StuI StyI SunI SwaI TaaI TaiI TasI TatI TauI TelI TfiI TliI Tru1I Tru9I TscI TseI Tsp45I Tsp509I TspEI TspRI Tth111I Vha464I VneI VpaK11BI VspI XagI XapI XbaI XceI XcmI XhoI XhoII XmaCI XmaI XmaIII XmaJI XmiI XmnI XspI ZhoI Zsp2I # Number of enzymes not matching SITELEN, BLUNT, STICKY, COMMERCIAL criteria 633
The name of the sequence is displayed, followed by the description of the sequence.
The formatted display of cut sites on the sequence follows, with the six-frame translation below it. The cut sites are indicated by a slash character '\' that points to the poition between the nucleotides where the cuts occur. Cuts by many enzymes at the same position are indicated by stacking the enzyme names on top of each other.
At the end the section header 'Enzymes that cut' is displayed followed by a list of the enzymes that cut the specified sequence and the number of times that they cut. For each enzyme that cuts, a list of isoschizomers of that enzyme (sharing the same recognition site pattern and cut sites) is given.
This is followed by lists of the enzymes that do cut, but which cut less often than the '-mincut' qualifier or more often than the '-maxcut' qualifier.
Any of the isoschizomers that are excluded from cutting, (either through restrictions such as the permitted number of cuts, blunt cutters only, single cutters only etc. or because their name has not been given in the input list of enzymes), will not be listed.
Then a list is displayed of the enzymes whose names were input and which match the other criteria ('-sitelen', '-blunt', '-sticky' or '-commercial') but which do not cut.
Finally the number of enzymes that were rejected from consideration because they do not match the '-sitelen', '-blunt', '-sticky' or '-commercial' criteria is displayed.
The '-flatreformat' qualifier changes the display to emphasise the recognition site of the restriction enzyme, which is indicated by a row of '=' characters. The cut site if pointed to by a '>' or '<' character and if the cut site is not within or imemdiately adjacent to the recognition site, they are linked by a row or '.' characters.
The name of the enzyme is displayed above (or below when the reverse sense site if displayed) the recognition site. The name of the enzyme is also displayed above the cut site if this occurs on a different display line to the recognition site (i.e. if it wraps onto the next line of sequence).
An example of this display follows with the translation turned off to save space:
% remap embl:eclac stdout -enz taqi,bsu6i,acii,hin6i,bsski -site 4 -sbeg 1 -send 60 -flat -notran Display a sequence with restriction cut sites, translation etc.. ECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes. Bsu6I >.........==== BssKI >===== TaqI Hin6I AciI Hin6I >=== >=== >..==== >=== GACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGT 10 20 30 40 50 60 ----:----|----:----|----:----|----:----|----:----|----:----| CTGTGGTAGCTTACCGCGTTTTGGAAAGCGCCATACCGTACTATCGCGGGCCTTCTCTCA ===< ===< <=== ===< TaqI Hin6I AciI Hin6I =====< BssKI <.....==== Bsu6I # Enzymes that cut Frequency Isoschizomers AciI 1 BssKI 1 Bsu6I 1 Hin6I 2 TaqI 1 # Enzymes < MINCUTS Frequency Isoschizomers # Enzymes > MAXCUTS Frequency Isoschizomers # Enzymes that do not cut # Number of enzymes not matching SITELEN, BLUNT, STICKY, COMMERCIAL criteria 0
These files must first be set up using the program 'rebaseextract'. Running 'rebaseextract' may be the job of your system manager.
abiview | Reads ABI file and display the trace |
backtranseq | Back translate a protein sequence |
cirdna | Draws circular maps of DNA constructs |
coderet | Extract CDS, mRNA and translations from feature tables |
lindna | Draws linear maps of DNA constructs |
pepnet | Displays proteins as a helical net |
pepwheel | Shows protein sequences as helices |
plotorf | Plot potential open reading frames |
prettyplot | Displays aligned sequences, with colouring and boxing |
prettyseq | Output sequence with translated ranges |
recoder | Remove restriction sites but maintain the same translation |
redata | Search REBASE for enzyme name, references, suppliers etc |
restover | Finds restriction enzymes that produce a specific overhang |
restrict | Finds restriction enzyme cleavage sites |
seealso | Finds programs sharing group names |
showalign | Displays a multiple sequence alignment |
showdb | Displays information on the currently available databases |
showfeat | Show features of a sequence |
showorf | Pretty output of DNA translations |
showseq | Display a sequence with features, translation etc |
silent | Silent mutation restriction enzyme scan |
textsearch | Search sequence documentation text. SRS and Entrez are faster! |
transeq | Translate nucleic acid sequences |
Changed 7 Dec 2000 - GWW - to declare isoschizomers that cut