DNA Recurrence Plot

Algorithm

Recurrence plot

In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for a given moment in time, the times at which a phase space trajectory visits roughly the same area in the phase space. In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity. It is a type of recurrence plot. One way to visualize the similarity between two protein or nucleic acid sequences is to use a similarity matrix, known as a dot plot. These are two-dimensional matrices that have the sequences of the proteins being compared along the vertical and horizontal axes. For a simple visual representation of the similarity between two sequences, individual cells in the matrix can be shaded black if residues are identical, so that matching sequence segments appear as runs of diagonal lines across the matrix.

Settings for DNA plot

The plot is the result of pattern comparison. The pattern length setting gives the length of the pattern. The first column of the matrix is the compare result of the first pattern with the consecutive comparison with each other pattern of the string.

For each pattern a twelve dimensional vector with the compnents: center, count and distance is used to define a metric.

Center

The first four components of the vector are the center or heavy point of the letters A, C, G and T in the pattern.

Number

The next four components are the counted number of the letters A, C, G and T in the pattern.

Distance

The last four components are the maximal distance of the letters A, C, G and T in the pattern.

Comparison

The length of the difference vector of the two patterns gives a value for the similarity of the patterns.

Plot

In the color plot count, distance and center scale the colors cyan, magenta and yellow in the CMYK color room.

The gray plot uses the whole vector to define the gray value.

2D FFT

The gray plot is the base for 2D FFT to calculate the power spectrum and the filtered reconstruction.

Full-Screen

DNA Sequence Plot

Pattern length=
Mouse position: column =
0
row =
0

2D FFT Plot

Base 2 dimension: px
Full-Screen

Gray scale plot

Full-Screen

Spectrum

Full-Screen

Reconstruction

Input field for the DNA sequence

Sequence represented with the letters: ACGT

The default example are the first bases of the ENA|U47924|U47924.1 human chromosome 12p13 sequence

Recurrence Matrix

Example for the recurrence matrix with a pattern length 5:

DNA-Recurrence-Matrix

Pattern Vector

For every pattern is a related vector calculated. The pattern vector v cosists of three components for Center C, counted number N and deviation D.

v = C N D

The components Center C, counted number N and deviation D are vectors with the four compnents for the letters A, C, G and T.

C = c A c C c G c T

N = n A n C n G n T

D = d A d C d G d T

Components Center

The center components c are calculated as follows:

c i = j = 0 L - 1 p i | i n i L

with:

i : A, C, G or T

L : Length of the pattern that means the number of letters in the pattern

p : Position of a letter in the pattern (0, 1, ..., L-1)

n : Conted number of a letter in the pattern

| : 1 if letter is on this position otherwise 0

Components Number

The Number components n are calculated as the sum of letter occurence in the pattern divided by the length of the pattern.

Components Deviation (Distance)

The Deviation components d are calculated as the distance of a letter from first to last occurence in the pattern divided by the maximal possible distance.

Matrix Elements

In this way each pattern gets a releated vector v. Then follow the matrix elements mi,j as the amount of the difference vector of the compared pattern.

m i j = | v i - v j |

In the same way follow the matrix elements mki,j for the part vectors.

m C i j = | C i - C j |

m N i j = | N i - N j |

m D i j = | D i - D j |

Matrix Color Coding

The matrix m of the total pattern difference coded for the gray scale plot. gray value = 255 ⋅ mi,j

The matrix mC of the center pattern difference coded for the cyan scale plot. cyan value = 255 ⋅ mCi,j

The matrix mN of the number pattern difference coded for the cyan scale plot. magenta value = 255 ⋅ mNi,j

The matrix mD of the deviation pattern difference coded for the cyan scale plot. yellow value = 255 ⋅ mDi,j