It will find inverted repeats that include a proprtion of mismatches and gaps (bulges in the stem loop).
It works by finding alignments between the sequence and its reverse complement that exceed a threshold score. Gaps and Mismatches are assigned a penalty (negative) score. Matches are assigned a positive score. The score is calculated by summing the values of each match, the penalties of each mismatch and the large penalties of any gaps. Any region whose score exceeds the threshold is reported.
einverted uses dynamic programming and thus is guaranteed to find the optimal alignment, but is slower than, for example, a self-by-self BLAST. It can find multiple inverted repeats in a sequence.
Currently the maximal extent of the interval from start to end of the repeat is set to 4000 bp.
Secondary structures like inverted repeats in genomic sequences may be implicated in initiation of DNA replication.
% einverted Input sequence: embl:hsts1 Output file [hsts1.inv]: Gap penalty : Minimum score threshold : Match score : Mismatch score [-4]:
Mandatory qualifiers: [-sequence] sequence Sequence USA [-outfile] outfile Output file name -gap integer Gap penalty -threshold integer Minimum score threshold -match integer Match score -mismatch integer Mismatch score Optional qualifiers: -maxrepeat integer Maximum separation between the start of repeat and the end of the inverted repeat (the default is 4000 bases). Advanced qualifiers: (none) General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbose
|Mandatory qualifiers||Allowed values||Default|
|Sequence USA||Readable sequence||Required|
|Output file name||Output file||<sequence>.einverted|
|-gap||Gap penalty||Any integer value||12|
|-threshold||Minimum score threshold||Any integer value||50|
|-match||Match score||Any integer value||3|
|-mismatch||Mismatch score||Any integer value||-4|
|Optional qualifiers||Allowed values||Default|
|-maxrepeat||Maximum separation between the start of repeat and the end of the inverted repeat (the default is 4000 bases).||Any integer value||4000|
|Advanced qualifiers||Allowed values||Default|
...................... Score 80: 44/51 ( 86%) matches, 2 gaps 12246 ctcctgcctcag-cctccaagtagctgggattaca-gcatgtgccaccatgcc 12296 |||||| ||||| | ||||| |||||||||||| ||||| |||||||| || 13938 gaggacagagtcagaaggtttcacgaccctaatgtccgtactcggtggtatgg 13886 Score 99: 53/65 ( 81%) matches, 1 gaps 13884 tgggtatggtggctcatgcctgtaatcccagcactttggaagactgagacaggagcaattgcttga 13949 ||||| ||||||| |||||||||||||||| ||| || ||||| ||| || |||||||||| 14692 acccacaccaccgtacacggacattagggtcgatggaccctccgactccgtcttc-ttaacgaact 14628
This is not due to a problem with either program. It is simply because some of the shortest repeats that you find with palindrome's default parameter values are below einverted's default cutoff score - you should decrease the 'Minimum score threshold' to see them.
For example, when palindrome is run with 'em:hsfau1', it finds the repeat:
64 aaaactaaggc 74 ||||||||||| 98 ttttgattccg 88
einverted will not report this as its score is 33 (11 bases scoring 3 each, no mismatches or gaps) with is below the default score cutoff of 50.
If einverted is run as:
% einverted em:hsfau1 -threshold 33
then it will find it:
Score 33: 11/11 (100%) matches, 0 gaps 64 aaaactaaggc 74 ||||||||||| 98 ttttgattccg 88
(Anything can be considered to be a repeat if you set the score threshold low enough!)
|equicktandem||Finds tandem repeats|
|etandem||Looks for tandem repeats in a nucleotide sequence|
|palindrome||Looks for inverted repeats in a nucleotide sequence|
palindrome also looks for inverted repeats but is much faster and less sensitive, as it looks for near-perfect repeats.
This application was modified for inclusion in EMBOSS by Peter Rice (firstname.lastname@example.org) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.