EMBOSS: einverted


Program einverted ( YMBC , NCHC )

Function

Finds DNA inverted repeats

Description

einverted looks for inverted repeats (stem loops) in a nucleotide sequence.

It will find inverted repeats that include a proprtion of mismatches and gaps (bulges in the stem loop).

It works by finding alignments between the sequence and its reverse complement that exceed a threshold score. Gaps and Mismatches are assigned a penalty (negative) score. Matches are assigned a positive score. The score is calculated by summing the values of each match, the penalties of each mismatch and the large penalties of any gaps. Any region whose score exceeds the threshold is reported.

einverted uses dynamic programming and thus is guaranteed to find the optimal alignment, but is slower than, for example, a self-by-self BLAST. It can find multiple inverted repeats in a sequence.

Currently the maximal extent of the interval from start to end of the repeat is set to 4000 bp.

Secondary structures like inverted repeats in genomic sequences may be implicated in initiation of DNA replication.

Usage

Here is a sample session with einverted.

% einverted
Input sequence: embl:hsts1
Output file [hsts1.inv]: 
Gap penalty [12]: 
Minimum score threshold [50]: 
Match score [3]: 
Mismatch score [-4]: 

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
  [-outfile]           outfile    Output file name
   -gap                integer    Gap penalty
   -threshold          integer    Minimum score threshold
   -match              integer    Match score
   -mismatch           integer    Mismatch score

   Optional qualifiers:
   -maxrepeat          integer    Maximum separation between the start of
                                  repeat and the end of the inverted repeat
                                  (the default is 4000 bases).

   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence USA Readable sequence Required
[-outfile]
(Parameter 2)
Output file name Output file <sequence>.einverted
-gap Gap penalty Any integer value 12
-threshold Minimum score threshold Any integer value 50
-match Match score Any integer value 3
-mismatch Mismatch score Any integer value -4
Optional qualifiers Allowed values Default
-maxrepeat Maximum separation between the start of repeat and the end of the inverted repeat (the default is 4000 bases). Any integer value 4000
Advanced qualifiers Allowed values Default
(none)

Input file format

The input for einverted is a nucleotide sequence

Output file format

Here is the output form the example run. The first two hits are removed to avoid the output lines being too wide.

......................

Score 80: 44/51 ( 86%) matches, 2 gaps
   12246 ctcctgcctcag-cctccaagtagctgggattaca-gcatgtgccaccatgcc 12296   
         |||||| ||||| | |||||   |||||||||||| ||||| |||||||| ||
   13938 gaggacagagtcagaaggtttcacgaccctaatgtccgtactcggtggtatgg 13886   

Score 99: 53/65 ( 81%) matches, 1 gaps
   13884 tgggtatggtggctcatgcctgtaatcccagcactttggaagactgagacaggagcaattgcttga 13949   
         ||||| |||||||   ||||||||||||||||    ||| || ||||| ||| || ||||||||||
   14692 acccacaccaccgtacacggacattagggtcgatggaccctccgactccgtcttc-ttaacgaact 14628   

Data files

None.

Notes

Sometimes you can find repeats using the program palindrome that you can't find with einverted using the default parameters.

This is not due to a problem with either program. It is simply because some of the shortest repeats that you find with palindrome's default parameter values are below einverted's default cutoff score - you should decrease the 'Minimum score threshold' to see them.

For example, when palindrome is run with 'em:hsfau1', it finds the repeat:

64    aaaactaaggc    74
      |||||||||||
98    ttttgattccg    88

einverted will not report this as its score is 33 (11 bases scoring 3 each, no mismatches or gaps) with is below the default score cutoff of 50.

If einverted is run as:

% einverted em:hsfau1 -threshold 33

then it will find it:

Score 33: 11/11 (100%) matches, 0 gaps
      64 aaaactaaggc 74      
         |||||||||||
      98 ttttgattccg 88      

(Anything can be considered to be a repeat if you set the score threshold low enough!)

References

Some assorted references on inverted repeats:

  1. Pearson CE, Zorbas H, Price GB, Zannis-Hadjopoulos M Inverted repeats, stem-loops, and cruciforms: significance for initiation of DNA replication. J Cell Biochem 1996 Oct;63(1):1-22
  2. Waldman AS, Tran H, Goldsmith EC, Resnick MA. q Long inverted repeats are an at-risk motif for recombination in mammalian cells. Genetics. 1999 Dec;153(4):1873-83. PMID: 10581292; UI: 20050682
  3. Jacobsen SE Gene silencing: Maintaining methylation patterns. Curr Biol 1999 Aug 26;9(16):R617-9
  4. Lewis S, Akgun E, Jasin M. Palindromic DNA and genome stability. Further studies. Ann N Y Acad Sci. 1999 May 18;870:45-57. PMID: 10415472; UI: 99343961
  5. Dai X, Greizerstein MB, Nadas-Chinni K, Rothman-Denes LB Supercoil-induced extrusion of a regulatory DNA hairpin. Proc Natl Acad Sci U S A 1997 Mar 18;94(6):2174-9

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

See also

equicktandemFinds tandem repeats
etandemLooks for tandem repeats in a nucleotide sequence
palindromeLooks for inverted repeats in a nucleotide sequence

palindrome also looks for inverted repeats but is much faster and less sensitive, as it looks for near-perfect repeats.

Author(s)

This program was originally written by Richard Durbin at the Sanger Centre.

This application was modified for inclusion in EMBOSS by Peter Rice (pmr@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

Written (1999) - Peter Rice

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments