EMBOSS: dottup


Program dottup ( YMBC , NCHC )

Function

Displays a wordmatch dotplot of two sequences

Description

A dotplot is a graphical representation of the regions of similarity between two sequences.

The two sequences are placed on the axes of a rectangular image and (in the simplest forms of dotplot) wherever there is a similarity between the sequences a dot is placed on the image.

Where the two sequences have substantial regions of similarity, many dots align to form diagonal lines. It is therefore possible to see at a glance where there are local regions of similarity as these will have long diagonal lines. It is also easy to see other features such as repeats (which form parallel diagonal lines), and insertions or deletions (which form breaks or discontinuities in the diagonal lines).

dottup looks for places where words (tuples) of a specified length have an exact match in both sequences and draws a diagonal line over the position of these words. This is a fast, but not especially sensitive way of creating dotplots. It is an acceptable method for displaying regions of substantial similarity between two sequences.

Using a longer word (tuple) size displays less random noise, runs extremely quickly, but is less sensitive. Shorter word sizes are more sensitive to shorter or fragmentary regions of similarity, but also display more random points of similarity (noise) and runs slower.

Usage

Here is a sample session with dottup.

% dottup embl:eclac embl:eclaci -wordsize=6 -gtitle="eclaci vs eclac"

click here for result

Command line arguments

   Mandatory qualifiers (* if not always prompted):
  [-sequencea]         sequence   Sequence USA
  [-sequenceb]         sequence   Sequence USA
   -wordsize           integer    Word size
*  -graph              graph      Graph type
*  -outfile            outfile    Output file name

   Optional qualifiers:
   -[no]boxit          bool       Draw a box around dotplot

   Advanced qualifiers:
   -data               bool       Output the match data to a file instead of
                                  plotting it

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequencea]
(Parameter 1)
Sequence USA Readable sequence Required
[-sequenceb]
(Parameter 2)
Sequence USA Readable sequence Required
-wordsize Word size Integer 2 or more 4
-graph Graph type EMBOSS has a list of known devices, including postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm, png EMBOSS_GRAPHICS value, or x11
-outfile Output file name Output file <sequence>.dottup
Optional qualifiers Allowed values Default
-[no]boxit Draw a box around dotplot Yes/No Yes
Advanced qualifiers Allowed values Default
-data Output the match data to a file instead of plotting it Yes/No No

Input file format

Any two sequence USAs of the same type (DNA or protein).

Output file format

An image is output to the resquested output device.

Data files

None.

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

0 upon successful completion.

Known bugs

None.

See also

dotmatcherDisplays a thresholded dotplot of two sequences
dotpathDisplays a non-overlapping wordmatch dotplot of two sequences
polydotDisplays all-against-all dotplots of a set of sequences

dotmatcher, by comparison, moves a window of specified length up each diagonal and displays a line over the window if the sum of the comparisons (using a substitution matrix) exceeds a threshold. It is slower but much more sensitive.

Author(s)

This application was written by Ian Longden (il@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

Completed 24th March 1999.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments