EMBOSS: equicktandem


Program equicktandem ( YMBC , NCHC )

Function

Finds tandem repeats

Description

equicktandem scans a sequence for potential tandem repeats up to a specified size. The results can be used to run etandem on the candidate repeat lengths to identify genuine tandem repeats.

equicktandem is a simple program that looks for segments in which each base tends to match the base back, i.e. with an autocorrelation peak at . This can allow drift in the repeating sequence, i.e. it does not look for a consensus sequence for the whole repeat block (that is what etandem does). But it is much quicker than etandem. It does not account for gaps.

The score is +1 for a match to the corresponding base back and -1 for a mismatch.

Usage

Here is a sample session with equicktandem. The input sequence is the human herpesvirus tandem repeat.

% equicktandem
Input sequence: embl:hhtetra
Output file [hhtetra.qtan]: 
Maximum repeat size [600]: 
Threshold score [20]: 

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
  [-outfile]           outfile    Output file name
   -maxrepeat          integer    Maximum repeat size
   -threshold          integer    Threshold score

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence USA Readable sequence Required
[-outfile]
(Parameter 2)
Output file name Output file <sequence>.equicktandem
-maxrepeat Maximum repeat size Any integer value 600
-threshold Threshold score Any integer value 20
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

The input for equicktandem is a nucleotide sequence

Output file format

The output from equicktandem is an uncommented list of identified repeats. In a future version this will change to be annotated sequence features.

The columns of the report show:

  1. Score
  2. Start base position
  3. End base position
  4. Repeat size
  5. Repeat count
   339        191        935  6 124

Data files

None.

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

See also

einvertedFinds DNA inverted repeats
etandemLooks for tandem repeats in a nucleotide sequence
palindromeLooks for inverted repeats in a nucleotide sequence

equicktandem identifies regions wich are likely to contain tanden repeats. tandem should then be run on those regions to confirm them and to get an accurate specification of the repeats. tandem runs slowly.

Author(s)

This program was originally written by Richard Durbin at the Sanger Centre.

This application was modified for inclusion in EMBOSS by Peter Rice (pmr@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

Completed 25 May 1999

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments