This article is part of the series Sequence and Genome Analysis.

Open Access Research

Map-invariant spectral analysis for the identification of DNA periodicities

Ahmad Rushdi1*, Jamal Tuqan2 and Thomas Strohmer3

Author Affiliations

1 Department of Electrical and Computer Engineering at the University of California, Davis, CA 95616, USA, and is now with Cisco Systems, Inc., San Jose CA 95134, USA

2 Department of Electrical and Computer Engineering at the University of California, Davis, CA 95616, USA

3 Department of Mathematics, University of California, Davis, CA 95616, USA

For all author emails, please log on.

EURASIP Journal on Bioinformatics and Systems Biology 2012, 2012:16 doi:10.1186/1687-4153-2012-16


The electronic version of this article is the complete one and can be found online at: http://bsb.eurasipjournals.com/content/2012/1/16


Received:31 May 2012
Accepted:6 September 2012
Published:15 October 2012

© 2012 Rushdi et al.; licensee Springer.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Many signal processing based methods for finding hidden periodicities in DNA sequences have primarily focused on assigning numerical values to the symbolic DNA sequence and then applying spectral analysis tools such as the short-time discrete Fourier transform (ST-DFT) to locate these repeats. The key results pertaining to this approach are however obtained using a very specific symbolic to numerical map, namely the so-called Voss representation. An important research problem is to therefore quantify the sensitivity of these results to the choice of the symbolic to numerical map. In this article, a novel algebraic approach to the periodicity detection problem is presented and provides a natural framework for studying the role of the symbolic to numerical map in finding these repeats. More specifically, we derive a new matrix-based expression of the DNA spectrum that comprises most of the widely used mappings in the literature as special cases, shows that the DNA spectrum is in fact invariable under all these mappings, and generates a necessary and sufficient condition for the invariance of the DNA spectrum to the symbolic to numerical map. Furthermore, the new algebraic framework decomposes the periodicity detection problem into several fundamental building blocks that are totally independent of each other. Sophisticated digital filters and/or alternate fast data transforms such as the discrete cosine and sine transforms can therefore be always incorporated in the periodicity detection scheme regardless of the choice of the symbolic to numerical map. Although the newly proposed framework is matrix based, identification of these periodicities can be achieved at a low computational cost.

Introduction

Many researchers have noted that the occurrence of repetitive structures in a DNA sequence is symptomatic of a biological phenomena. Specific applications of this observation include identification of diseases [1], DNA forensics [2], and detection of pathogen exposure [3]. Some of these structures are simple repetition of short DNA segments such as exons [4], tandem repeats [5], dispersed repeats [6], and unstable triplet repeats in the noncoding regions [7] while other forms more elaborate patterns such as palindromes [8] and the period-3 component [9-13], a strong periodic characteristic found primarily in genes and pseudogenes [14]. Methods that detect these DNA periodicities are either probabilistic or deterministic. Most of the deterministic techniques rely on spectral analysis of the DNA sequence using the short-time discrete Fourier transform (ST-DFT) [15-17]. The main idea is as follows: given a DNA sequence of length N, numerical values are first assigned to every element in <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M1">View MathML</a>={ACGT}, where these letters denote the four nucleotides in the DNA, namely the two purines: adenine (A) and guanine (G) and the two pyrimidines: thymine (T) and cytosine (C). A typical DNA double helix is shown in Figure 1.

thumbnailFigure 1. DNA: a straightened helix structure.

The symbolic to numerical map is clearly not unique, typically has a biological interpretation, and needs to preserve the specific structure of the DNA sequence under study. One such popular map is the Voss representation <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M2">View MathML</a><a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M3">View MathML</a>={0,1}, where four binary indicator sequences xl(n), l<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M4">View MathML</a>, are generated with 1 indicating the presence of a nucleotide and 0 its absence [18]. An example of the mapping of a single DNA strand to <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M5">View MathML</a> is shown in Figure 2.

thumbnailFigure 2. The Voss representation of a DNA segment.

Once the DNA symbolic sequence is mapped into numerical version(s), a set of discrete time sequences are generated and are the numerical equivalence of the DNA sequence. These numerical sequences can then by processed using standard signal processing techniques. In particular, the ST-DFT for each elementary sequences can be computed as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M6">View MathML</a>

(1)

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M7">View MathML</a>, where n is the window starting point, R is the amount of window shift, and h(m)=1 for −M + 1≤m≤0 and zero otherwise. If R=1, then, the window slides one nucleotide at a time whereas if R=3, the displacement of the window is on a 3-nucleotide basis. Note that the all-ones function h(m) does not affect the value of Xl(Rn,k). However, it serves as a place holder for other filters that can be used to replace it, as will be shown in the following section. One popular application of the ST-DFT based technique that has received considerable attention in the past is the identification of the period-3 component using the DNA spectrum, defined for R=3 as follows

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M8">View MathML</a>

(2)

A number of researchers have advocated the use of the period-3 component to discriminate between coding and non coding regions (see for example [11,13,16,19-23] to name a few) but the subject remains highly controversial as it is successful for certain genes but does not work for others. To better comprehend the underlying reasons behind this disparity in performance, a new multirate DSP model that provides a full understanding of the inner workings of the DNA periodicity has been first proposed in [24], and studied in details in [25]. This model is shown in Figure 3.

thumbnailFigure 3. The Multirate DSP model for general R. The period-3 case is easily obtained by setting R = 3.

This model provides closed form expressions for the DNA spectrum that generalize and unify some of the already existing results in the literature were obtained. One of these expressions in particular clearly shows that the identification of the period-3 component in the DNA spectrum, a signal processing problem, is equivalent to the detection of the nucleotide distribution disparity in the codon structure of a DNA sequence, a genomic problem. The disparity in the nucleotide distribution within the codon structure of a DNA sequence is termed the codon bias. Using this model, the DNA spectrum is completely characterized by a set of digital sequences, termed the filtered polyphase sequences. By processing these sequences, signal processing techniques can potentially have an impact on understanding and detecting biological structures of this nature. From a computational cost perspective, the computation of the DNA spectrum using this model does not require any complex valued operations [26]. This finding is rather surprising given the existence of complex multipliers in the proposed DSP model as clearly illustrated in Figure 3. It is shown that the direct computation of the DNA spectrum using (2) requires essentially double the amount of arithmetic operations compared to the DSP model approach.

It is important, however, to keep in mind that the above conclusions and results were obtained using the Voss symbolic to numerical transformation. A fundamental research issue is to therefore determine the sensitivity of the signal processing based method to the choice of the symbolic to numerical map. In particular, the core question here is: how dependent are the above results on the Voss representation? Are these results invariant with respect to the other popular maps in the literature? Can we derive necessary and/or sufficient conditions for the invariance of the DNA spectrum to the symbolic to numerical transformation? Is there a general mathematical framework that can help us generate new symbolic to numerical maps for which the DNA spectrum remains essentially the same? These are the type of questions we address in this article and provide answers to. One approach to answer this question was presented in [27], where a novel framework for the analysis of the equivalence of the mappings used for numerical representation of symbolic data based on signal correlation was presented, along with strong and weak equivalence properties. In [28], we attempted to answer the same question starting at the aforementioned DSP model for a limited set of mappings. Our main goal in this study is to de-embed the symbolic to numerical mapping process from the DNA spectrum computation process. We answer a set of other relevant questions along the way.

A key remark is in order at this point: while the DSP model approach proposed in Figure 3 has many advantages, it is not well suited for investigating the role of the symbolic to numerical map in the identification of DNA harmonics. It follows that a completely new paradigm for detecting DNA harmonics is required. The main contribution of this article is therefore the derivation of a novel matrix-based framework for the computation of the DNA spectrum that is extremely well fitted to the study of the symbolic to numerical transformation. Specifically, we first derive a new matrix-based expression of the DNA spectrum that:

1. comprises most of the existing mappings in the literature as special cases,

2. shows that the DNA spectrum is in fact invariable under all these mappings,

3. generates a necessary condition for the invariance of the DNA spectrum to the symbolic to numerical mapping used to compute it.

Furthermore, the new algebraic framework presented here decomposes the frequency identification problem into several fundamental components that are totally independent of each other. It follows that sophisticated digital filters and/or alternative transformations to the DFT such as the discrete cosine, sine, and Hartley transforms can always be easily incorporated in the harmonics detection scheme irrespective of the choice of the symbolic to numerical map. Finally, although the newly proposed framework is matrix based, we show that similar to the DSP model approach, the computation of the DNA spectrum using this new framework is very efficient.

The article is organized as follows. In Section 2, we derive a new matrix based framework to efficiently compute the ST-DFT-based spectrum. New expressions for the ST-DFT <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M9">View MathML</a> and its magnitude squared <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M10">View MathML</a> are obtained and indicate that these quantities are completely parameterized by some pre-defined matrices. The numerical values of these matrices simply depend on our choice of filtering (e.g., rectangular window versus non-rectangular one versus general FIR filters) as well as our choice of data transform (e.g., the DFT versus the DCT versus the DST).

Using these results, in Section 3, a new expression of the DNA power spectrum is derived and is also completely defined by these matrices. The elegance of this matrix based approach is that it allows the incorporation of general symbolic to numerical maps into the newly derived DNA spectrum expression provided these generic maps can be expressed as affine transformations of the Voss representation. This last assumption is motivated by the fact that all the popular maps that are available in the literature satisfy the affine condition. Furthermore, the maps are now completely characterized by the affine transformation (two matrices A and b) and can be therefore changed without affecting the remaining matrices in the DNA spectrum expression. In conclusion, the newly derived DNA spectrum expression is stated as a function of a number of matrices. Each of these matrices captures an essential component of the process (filtering, data transform, symbolic to numerical map) and the elements of each matrix can be changed without affecting the other matrices.

In Section 4 and using the above results, we show that the Voss-based DNA spectrum is essentially invariant under some of the most popular maps in the literature. A necessary and sufficient condition for the invariance of the DNA spectrum under any map is also derived.

In Section 5, we show how the special structure of the filtering matrix allows the efficient use of sophisticated digital filters to improve the detection performance of DNA harmonics through the computation of the DNA spectrum. We also show how to replace the DFT by other fast transforms such as the discrete cosine transform (DCT), the discrete sine transform (DST), and the discrete Hartley transform (DHT). Finally, some concluding remarks are mentioned in Section 6. A list of the different notation used in the article is summarized in Table 1.

Table 1. Summary of the article notations

A new algebraic framework for computing the ST-DFT

Given a sequence x(n) of length N, the ST-DFT is typically implemented using a sliding window approach as shown in Figure 4. Windows of length M that overlap with a factor R are first generated to form xr(n),r=1,2,…,Nw, where Nw=⌈(NM + 1)/R⌉ is the number of resulting windows. Once we map the DNA sequence into an integer number of numeric sequences γ, given by xl(n), l=1,…,γ(<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M40','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M40">View MathML</a>), the ST-DFT’s Xl(n), l=1,…,γcan be found and their squared magnitudes are added to result in the DNA Spectrum S(n) as summarized in Figure 5.

thumbnailFigure 4. Splitting x(n) into Nw overlapping sections xr(n) using a sliding window approach.

thumbnailFigure 5. System structure to find the DNA power spectrum S(n) by extracting successive sliding windows of the symbolic DNA sequence, mapping each to γ numeric sequences, finding their DFT’s at <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M42">View MathML</a>, and finally adding the corresponding squared magnitudes. In this example, Nw=⌈(NM + 1)/R⌉=⌈(12−6 + 1)/3⌉= 3 windows are generated.

It was shown in [26] that the ST-DFT of x(n) can be written as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M43','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M43">View MathML</a>

(3)

where the quantities Xr(n),∀ r∈{0,1,…,R−1} are the so-called filtered polyphase sequences given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M44','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M44">View MathML</a>

(4)

r∈{0,1,…,R−1}. The impulse response hr(m) is the inverse <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M45','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M45">View MathML</a>-transform of Hr(z) in Figure 3. Equations (3) and (4) can be used to compute the ST-DFT of a discrete time sequence, and subsequently its magnitude squared. In this section, we re-express these equations in matrix form, and then use the new formula to derive an expression for <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M46','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M46">View MathML</a>. Throughout the article, vectors and matrices (arrays) are always expressed in bold letters. The notation for the various matrix operations is given in Table 2.

Table 2. Notation of matrix operations

Matrix formulation of the ST-DFT

Using the defined matrix notation, we can restate Equation (3) as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M47','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M47">View MathML</a>

(5)

The real valued array

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M48','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M48">View MathML</a>

(6)

is the vector whose elements are the R filtered polyphase components. Similarly, the complex valued R-element array

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M49','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M49">View MathML</a>

(7)

is the vector whose elements are the R equispaced phasors located on the unit circle with <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M50">View MathML</a> phase deviations as shown in Figure 6 for R=3 and R=8. Note that

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M51','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M51">View MathML</a>

(8)

R≠1, which implies that the sum of elements in C is equal to 0. This is a key feature of the complex array C that will be used in later sections to simplify important expressions.

thumbnailFigure 6. Elements of array C of Equation (7), represented as phasors on the unit circle for (a) R =3, and (b) R =8.

On the other hand, we observe that (4) can be written in the following matrix format

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M52','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M52">View MathML</a>

(9)

r∈{0,1,…,R−1}, where h is an all-one vector of length M/R, and <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M53','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M53">View MathML</a> of length M/R is the rthpolyphase component of the window x(n) of length M. Using (9), the R filtered polyphase components Xr(n) can be arranged in the following array format

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M54','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M54">View MathML</a>

(10)

Using the identity

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M55','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M55">View MathML</a>

(11)

it follows that

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M56','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M56">View MathML</a>

which can be restated in matrix format as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M57','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M57">View MathML</a>

(12)

where HIRhT is an R×R matrix of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M58','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M58">View MathML</a> blocks, given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M59','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M59">View MathML</a>

The window <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M60','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M60">View MathML</a> of length M is a block interleaved version of the sliding window x(n) of length M starting at index n. Generating <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M61','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M61">View MathML</a> can be accomplished by blocking the window x(n) into an array of R elements per row (hence M/R rows), and then reading the array out column by column. The ST-DFT <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M62','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M62">View MathML</a> can therefore be completely identified as a function of C, h, and <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M63','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M63">View MathML</a> as follows

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M64','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M64">View MathML</a>

(13)

The complex row vector CTH is an array of R blocks, each of length <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M65','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M65">View MathML</a> as given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M66','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M66">View MathML</a>

which represents M/R repetitions of the elements in C. Similar to C, the sum of elements in CTH is equal to 0.

A matrix based expression for the magnitude squared of the ST-DFT

Using (5), the magnitude squared of the ST-DFT can be expressed as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M67','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M67">View MathML</a>

(14)

where matrix DCCT is an R×Rmatrix given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M68','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M68">View MathML</a>

D is obviously a right circulant (hence Toeplitz) matrix whose rows and columns are rotated versions of C. Obviously, the sum of any row or column elements in D is equal to 0. Substituting (12) in (14), or equivalently using (13), implies that the spectrum S(n) can be stated as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M69','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M69">View MathML</a>

(15)

where

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M70','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M70">View MathML</a>

is an R×R matrix of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M71','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M71">View MathML</a> blocks, given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M72','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M72">View MathML</a>

Matrix W can be represented as a Kronecker product of D and an <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M73','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M73">View MathML</a> all-one matrix. Note that any row or column in W is a rotated version of CTH, therefore, the sum of the elements of any row or column in Wis equal to 0.

The new DNA spectrum expression

A first step towards finding the DNA spectrum S(n) is the symbolic to numeric mapping <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M74','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M74">View MathML</a> as was shown in Figure 5. Once the symbolic DNA sequence is mapped into γnumeric sequence(s), the short-time discrete Fourier transform is applied to each of them and the sum of the squared magnitudes of the ST-DFTs will result in the DNA spectrum at the frequency point <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M75','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M75">View MathML</a> as given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M76','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M76">View MathML</a>

(16)

For simplicity, we denote <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M77','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M77">View MathML</a> as S(n) in the following sections. Several mappings were introduced in the literature using both real and complex numerical values with typical number of sequences γ=1 up to 4 to maintain reasonable computation complexity. In this section, we use the results of Section 2 to derive general expressions for the M/R ST-DFT and spectrum for any symbolic to numeric mapping.

The Voss-based DNA spectrum

The simplest and most commonly used map of a DNA sequence is the Voss representation <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M78','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M78">View MathML</a>: that is to form γ=4 binary indicator sequences xA(n),xC(n), xG(n), and xT(n) where a 1 would indicate the presence of a base and 0 indicates its absence [18]. This approach has been extensively used in relevant genomic research. Note that the four sequences are not linearly independent since for any index n, the four sequences will add up to one. That is

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M79','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M79">View MathML</a>

This redundancy plays an important role in the derivations of this section. Moreover, it follows that for any length-M window starting at n, the four mapped Voss windows will add up to an all-one length-M sequence and the same fact holds for the interleaved windows

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M80','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M80">View MathML</a>

(17)

For illustration, Figure 7a shows a sample DNA window that is mapped into the corresponding numeric windows <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M81','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M81">View MathML</a> in Figure 7b,d,f,h. With an example interleaving factor R=3, the interleaved windows <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M82','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M82">View MathML</a> are shown in Figure 7c,e,g,i. Each of the four sequences is a discrete time sequence that can be processed using the analysis of Section 2.

thumbnailFigure 7. A sample DNA window of length M =9, the corresponding Voss binary windows <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M85','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M85">View MathML</a>, and the interleaved versions <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M86','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M86">View MathML</a>with an interleaving factor R = 3. The interleaved windows are generated by rearranging the original windows in an R=3-interleaved format. In this example, data points of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M87','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M87">View MathML</a> at (1,2,3),(4,5,6),(7,8,9) are mapped from those in xl(n) at (1,4,7),(2,5,8),(3,6,9).

Therefore, the ST-DFT of each sequence can be found using (13) to be

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M88','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M88">View MathML</a>

(18)

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M89','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M89">View MathML</a>, and the power spectrum of each sequence can hence be derived as in (16) to be

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M90','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M90">View MathML</a>

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M91','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M91">View MathML</a>. It follows that the Voss-based DNA spectrum Sv(n) is

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M92','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M92">View MathML</a>

(19)

An obvious step at this point is to simplify (19) to avoid the summation over different bases. To do this, we use Equation (18) to arrange the ST-DFT’s of xl(n), <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M93','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M93">View MathML</a> in the following format

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M94','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M94">View MathML</a>

(20)

Using (11), it follows that

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M95','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M95">View MathML</a>

We define ϒv(n): the array of the four Voss-based ST-DFTs. It can now be written as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M96">View MathML</a>

(21)

where I4 is the 4×4 identity matrix, and the vector <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M97','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M97">View MathML</a> of length 4M is an array of the four Voss interleaved windows starting at index n: <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M98','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M98">View MathML</a>. Using the identity

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M99','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M99">View MathML</a>

(22)

the Voss-based DNA power spectrum can be manipulated into

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M100','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M100">View MathML</a>

(23)

In (26), I4 and W are constant matrices ∀n. Hence the computation of the spectrum Sv(n) for different windows of a DNA sequence needs only the evaluation of the Voss interleaved array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M101','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M101">View MathML</a>.

Computing the DNA spectrum under general symbolic to numerical maps

Similar to the Voss representation case, any map <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M102','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M102">View MathML</a> of γsequences can be processed using the analysis of Section 2. It directly follows that the ST-DFT and spectrum of a single sequence are given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M103','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M103">View MathML</a>

where l=1,2,…,γ. The array of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M104','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M104">View MathML</a>-mapped ST-DFTs ϒd(n) is therefore given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M105','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M105">View MathML</a>

(24)

The <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M106','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M106">View MathML</a>-based DNA spectrum can easily be shown to be

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M107','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M107">View MathML</a>

(25)

where the vector <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M108','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M108">View MathML</a> of length γM is an array of the γ<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M109','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M109">View MathML</a>-mapped and interleaved windows starting at index n: <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M110','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M110">View MathML</a>. It is clear that for every different map <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M111','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M111">View MathML</a>, a new interleaved windows array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M112','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M112">View MathML</a> has to be evaluated in order to compute a spectrum point Sd(n). In this following, we introduce a different new approach to recompute (28) without updating <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M113','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M113">View MathML</a> for every map. Basically, we derive a new expression for Sd(n) in terms of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M114','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M114">View MathML</a> and a new constant matrix so that we incorporate the map dependance in the matrix part rather than the interleaved array part. In other words, since the map <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M115','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M115">View MathML</a> is already well-defined, we use the map <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M116','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M116">View MathML</a> to complete the chain <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M117','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M117">View MathML</a> and hence find the spectrum Sd(n). Consider the following affine transformation from Voss sequences to a general array of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M118','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M118">View MathML</a>-mapped sequences

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M119','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M119">View MathML</a>

where Aγ×4and bγ×1=[b1b2bγ]Tare constant possibly complex valued arrays. It follows that the array of the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M120','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M120">View MathML</a>-mapped interleaved windows <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M121','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M121">View MathML</a> can be written in terms of the array the Voss-mapped interleaved windows <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M122','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M122">View MathML</a> in the following form

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M123','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M123">View MathML</a>

(26)

where <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M124','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M124">View MathML</a> defined as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M125','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M125">View MathML</a>

is an array of γM-element blocks, each block is M repetitions of one element of b. Substituting for <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M126','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M126">View MathML</a> in (24) results in a new formula for the array of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M127','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M127">View MathML</a>-mapped ST-DFTs ϒd(n) into

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M128','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M128">View MathML</a>

(27)

An important result at this point is that the second term in ϒd(n) is actually equal to 0. This can be verified by reducing it into the following form

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M129','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M129">View MathML</a>

Recall that the sum of elements in CTHis equal to 0. Therefore, since <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M130','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M130">View MathML</a> is a constant vector, the product <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M131','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M131">View MathML</a> is equal to 0, ∀ l=1,2,…,γand hence

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M132','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M132">View MathML</a>

(28)

The ST-DFTs array ϒd(n) can therefore be simplified using the Kronecker product identity (22) into

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M133">View MathML</a>

(29)

It follows that the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M134','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M134">View MathML</a>-based DNA spectrum Sd(n) is

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M135','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M135">View MathML</a>

(30)

where BAHA. Equation (35) indicates that when a certain symbolic to numeric mapping <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M136','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M136">View MathML</a> is used, the DNA power spectrum Sd(n) is completely defined in terms of the Voss-based interleaved array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M137','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M137">View MathML</a> along with constant matrices W and B which is a function of the transformation matrix A (<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M138','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M138">View MathML</a>). Note that if A=I4then B=I4 at which (35) reduces to (26) which is the Voss-based spectrum case.

Invariance of the DNA spectrum under popular mappings

The results found in Section 3 can be applied to some mappings that are widely used in the literature. In specific, by defining the corresponding transformation matrices A and B (<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M139','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M139">View MathML</a>), closed form expressions for Sd(n) are obtained. Furthermore, for a number of mappings, we show that the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M140','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M140">View MathML</a>-mapped spectrum Sd(n) is in fact a scaled version of the Voss-based spectrum Sv(n).

Four-to-four (γ=4) representations

In this scheme, each Voss sequence is scaled by a possibly complex coefficient according to the following transformations matrices

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M141','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M141">View MathML</a>

where a, a, g, and t are real or complex coefficients used to scale xA(n),xC(n),xG(n), and xT(n), respectively. The corresponding array of ST-DFT’s ϒd(n) is subsequently given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M142','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M142">View MathML</a>

and the DNA spectrum Sd(n) is

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M143','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M143">View MathML</a>

Now, we extend this result to certain transformations where numeric values of the scale factors a, a, g, and t are specified.

§Tetrahedral mapping.

The so-called tetrahedral representation has been proposed in [13,29]. In this mapping scheme, the four nucleotides are represented by four equal length vectors oriented towards the corners of a tetrahedron. Projecting the basic tetrahedron on a plane will reduce the dimensionality of the representation to two. This mapping can be defined by the mapping matrix

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M144','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M144">View MathML</a>

It can be easily seen that in this case: <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M145','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M145">View MathML</a> which implies that B=2I4. The corresponding DNA spectrum is

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M146','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M146">View MathML</a>

(31)

Since B=αI4(α=2), the tetrahedral-based DNA spectrum is a scaled version of the Voss-based spectrum.

§Quaternion mapping.

A more involved step is to replace the complex number set of the tetrahedral mapping with its algebraic generalization, the set of quaternions. Quaternions have been used to map DNA sequences <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M147','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M147">View MathML</a>[30] and are simply defined as hypercomplex numbers given by <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M148','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M148">View MathML</a>, where i,j,k are complex coefficients such that i2=j2=k2=ijk=−1 and <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M149','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M149">View MathML</a>. The transformation matrix is given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M150','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M150">View MathML</a>

In this case, <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M151','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M151">View MathML</a>. The corresponding DNA spectrum is

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M152','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M152">View MathML</a>

(32)

§Higher order mappings.

An alternative Quaternion transformation is given by A=diag(1 + i + j + k,1 + ijk,1−ij + k,1−i + jk), which results in B=4I4 and consequently Sd(n)=4Sv(n). In general, for a complex representation system with η dimensions and equal amplitude coefficients: B=ηI4and hence the spectrum Sd(n)=ηSv(n).

Four-to-three (γ=3) mappings

In order to reduce the DNA spectrum computational cost, several mappings have been proposed with smaller numbers of sequences.

§<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M153','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M153">View MathML</a>-curve mapping.

One such important symbolic-to-numeric map is the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M154','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M154">View MathML</a>-curve mapping [24], which is a unique 3-dimensional curve representation whose sequences have values 1 and −1. One advantage of the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M155','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M155">View MathML</a>-curve mapping is that each of its three sequences has a biological interpretation. This scheme is given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M156','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M156">View MathML</a>

Therefore, the transformation matrices are

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M157','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M157">View MathML</a>

Matrix B in this case can be written as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M158','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M158">View MathML</a>

Note that the term involving B1in Sd(n) can be manipulated into

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M159','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M159">View MathML</a>

Recall from (19) that <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M160','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M160">View MathML</a>. Take also into consideration that the sum of elements of any row or column in W is equal to 0. This implies that <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M161','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M161">View MathML</a>, at which it is easy to see that <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M162','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M162">View MathML</a>. Similarly, <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M163','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M163">View MathML</a>. Therefore, only the first term in B contributed to Sd(n) at which the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M164','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M164">View MathML</a>-curve mapped DNA spectrum is a scaled version of the Voss-based DNA spectrum

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M165','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M165">View MathML</a>

(33)

This ratio is consistent with the result we first derived in [24] for R=3, but is now shown to be general for any value of R. We are now ready to state an important result.

Theorem.Necessary and Sufficient condition for the invariance of the DNA spectrum. Consider the following affine transformation from Voss sequences to a general array of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M166','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M166">View MathML</a>-mapped sequences

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M167','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M167">View MathML</a>

where Aγ×4 and bγ×1=[b1b2bγ]T are constant possibly complex valued arrays. Define the 4×4 matrix B=AHA. The DNA spectrum is invariant under this map, i.e., Sd(n)=αSv(n) if the transformation matrix B can be written as <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M168','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M168">View MathML</a>, where Bi holds constant rows and/or constant columns ∀ i.

The proof follows by simply observing that if Bihas constant rows and/or constant columns, then <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M169','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M169">View MathML</a>. We remind the reader at this point that the vector bγ×1has no bearing on the invariance of the DNA spectrum.

§Simplex mapping.

The simplex mapping is essentially another tetrahedron structured mapping that aims to eliminate the computational redundancy. Its transformations matrices are

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M170','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M170">View MathML</a>

Matrix B in this case can be written as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M171','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M171">View MathML</a>

Similar to the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M172','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M172">View MathML</a>-curve case, <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M173','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M173">View MathML</a>. It follows that the simplex-based DNA spectrum is also a scaled version of the Voss-based spectrum, and is given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M174','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M174">View MathML</a>

(34)

This ratio is consistent with the result in [31] which was limited to direct DFT and is now shown to be extended to M/R ST-DFT with any value of R.

Four-to-two (γ=2) mappings

Pairing couples of nucleotides together was proposed in the literature in order to exploit certain biological features in addition to complexity reduction. For example, it was suggested that exons are rich in nucleotides C and G, while introns have more A and T[29]. This claim inspired the transformation

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M175','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M175">View MathML</a>

It is obvious that the DNA spectrum in this case can be simplified to

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M176','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M176">View MathML</a>

(35)

which obviously is not a scaled version of Sv(n) since B in this case can not be written as <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M177','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M177">View MathML</a>, where Bi holds constant rows and/or constant columns ∀ i.

Four-to-one (γ=1) mappings

Single sequence representations can be generated by assigning each nucleotide a certain coefficient [4,13] in order to keep the single sequence structure using the transformation array and matrix

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M178">View MathML</a>

Note that the coefficients chosen for the tetrahedral, quaternion, and paired coupled mappings can be reused along with the single sequence formulation. For example, the paired couples case can be reformulated in a single sequence of 1’s and −1’s using <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M179','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M179">View MathML</a> and

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M180','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M180">View MathML</a>

at which the DNA spectrum is

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M181','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M181">View MathML</a>

Similar to the previous case, Sd(n) is not a scaled version of Sv(n).

Experimental verification

To briefly verify the results of this section experimentally, we apply Equation (35) to real DNA sequences, when the Voss, tetrahedral, quaternion, Z-curve, and simplex maps are employed. For comparison with previous study, we consider first the DNA sequence F56F11.4 in the C. elegans chromosome III. This sequence is 8060 nucleotides and has been used as a benchmark by many researchers [13] to extract the periodicity component at R=3. The DNA spectra at R=3 are shown in Figure 8 for the five former mappings, and are obviously related by the constant scale factors derived earlier in the section which clearly verifies our results. Although we lack the space for more general simulations, it is important to state that all the spectra relations are maintained experimentally at other values of R associated with higher order periodicities.

thumbnailFigure 8. DNA spectrum Sd(n) at R = 3 of the DNA sequence F56F11.4 when (a) Voss, (b) tetrahedral, (c) quaternion, (d) Z-curve, and (e) simplex mappings are used.

For generality purposes, we test two more sequences extracted from the well known Burset-Guigo database [32]. In specific, DNA spectra at R=3 of the zeta globin gene (ECZGL2) of length 1563, and the Alouatta seniculus epsilon-globin gene (ALOEGLOBIM) of length 1691 are shown in Figures 9 and 10, respectively, for the five former mappings. It can be seen that the relations are still preserved.

thumbnailFigure 9. DNA spectrum Sd(n) at R = 3 of the DNA sequence ECZGL2 when (a) Voss, (b) tetrahedral, (c) quaternion, (d) Z-curve, and (e) simplex mappings are used.

thumbnailFigure 10. DNA spectrum Sd(n) at R = 3 of the DNA sequence ALOEGLOBIM when (a) Voss, (b) tetrahedral, (c) quaternion, (d) Z-curve, and (e) simplex mappings are used.

Alternative measures of DNA periodicities

Alternative DNA periodicity measures using fast data transforms [33-35], wavelets, and finite impulse response (FIR) digital filters [25,36] were recently proposed to improve the detection performance of these periodicities. However, each method was obtained separately from the other using seemingly a different approach. In this section, we show that our proposed framework can systematically generate all these results by simply changing a number of matrices. It therefore provides a generic unified framework for generating alternative measures of DNA periodicities. For example, we can re-express the matrices D and W in terms of general digital filters and use these filters to modify (35) in order to generate new spectrum formulas. Furthermore, using symmetry based decompositions of D and W, we simplify (35) into a formula with low computational complexity.

Modified periodicity measures

Recall from Section ‘2’ that matrix W is given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M182','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M182">View MathML</a>

Obviously, W is completely defined by the real array h and the generally complex array C. Note that h and C can be viewed as the impulse responses of two FIR filters defined by the z-transforms H(z) and C(z).

Updating the real filter h

The FIR filter H(z) is the standard rectangular window filter and has a low pass frequency response with a −13 dB attenuation. To improve its filtering performance, we can use a more general FIR filter, denoted by <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M183','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M183">View MathML</a> and expressed as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M184','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M184">View MathML</a>

which is the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M185','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M185">View MathML</a>-transform of the general array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M186','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M186">View MathML</a> given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M187','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M187">View MathML</a>

From a signal processing perspective, achieving better performance can be obtained by replacing the rectangular window with another one, <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M188','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M188">View MathML</a>, that has slightly wider main lobes but much more attenuated side lobes, as shown in Table 3. The impulse responses of such windows are depicted in Figure 11a for R=8 and M=96. Better harmonics characterization can be achieved by giving each nucleotide position within the window a relative weight in contrast to the rectangular where equal weighting is given to all nucleotides. It turns out that the Blackman window has the best main-to-first side lobe attenuation behavior as shown in Figure 11b compared to the rectangular window case and therefore provides the best smoothing of the DNA spectrum.

Table 3. FIR window Specifications: relative peak side lobe A1/A0in dB, approximate width of main lobe Δω, equivalent Kaiser window coefficient β, and transition width Δωβ.

thumbnailFigure 11. Comparison between standard FIR windows showing (a) impulse response, (b) magnitude response, when R = 8 and M = 96.

By replacing h with <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M189','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M189">View MathML</a>, the matrix H can be in turn expressed as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M190','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M190">View MathML</a>

and the complex row vector <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M191','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M191">View MathML</a> is now given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M192','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M192">View MathML</a>

It can be easily seen that the sum of elements in <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M193','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M193">View MathML</a> is still equal to zero as was the case for CTH. Consequently, it follows that the sum of any row or column in <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M194','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M194">View MathML</a> is still equal to zero. This is a fundamental result which, in turn, implies that all the derivations of Section 3 are still the same even when <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M195','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M195">View MathML</a> replaces h. In particular, the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M196','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M196">View MathML</a>-based DNA spectrum <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M197','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M197">View MathML</a> and the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M198','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M198">View MathML</a>-based one <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M199','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M199">View MathML</a> can be stated as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M200','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M200">View MathML</a>

(36)

Moreover, all the mathematical relations derived in Section 3 between the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M201">View MathML</a>-based spectrum and the Voss-based one are all still valid even when h is replaced by <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M202','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M202">View MathML</a>.

Experimental verification

To experimentally verify this result, we consider finding the DNA spectrum <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M203','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M203">View MathML</a> of the three DNA sequences used in the previous section when <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M204','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M204">View MathML</a> is set to a Blackman window. The relations between the spectra when using the Voss, tetrahedral, quaternion, Z-curve, and simplex mappings are still the same as shown in Figures 12, 13, and 14.

thumbnailFigure 12. DNA spectrum<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M207','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M207">View MathML</a>with<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M208','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M208">View MathML</a>set to a Blackman window at R = 3 of the DNA sequence F56F11.4 when (a) Voss, (b) tetrahedral, (c) quaternion, (d) Z-curve, and (e) simplex mappings are used.

thumbnailFigure 13. DNA spectrum<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M211','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M211">View MathML</a>with<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M212','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M212">View MathML</a>set to a Blackman window at R = 3 of the DNA sequence ECZGL2 when (a) Voss, (b) tetrahedral, (c) quaternion, (d) Z-curve, and (e) simplex mappings are used.

thumbnailFigure 14. DNA spectrum<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M215','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M215">View MathML</a>with<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M216','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M216">View MathML</a>set to a Blackman window at R = 3 of the DNA sequence ALOEGLOBIM when (a) Voss, (b) tetrahedral, (c) quaternion, (d) Z -curve, and (e) simplex mappings are used.

Updating the complex filter C

Similar to H(z), the FIR filter C(z) can be replaced by a more sophisticated filter <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M217','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M217">View MathML</a> expressed as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M218','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M218">View MathML</a>

which is the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M219','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M219">View MathML</a>-transform of the general array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M220','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M220">View MathML</a> given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M221','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M221">View MathML</a>

Note that, in this case, the elements in array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M222','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M222">View MathML</a> do not necessarily add to zero anymore. Consequently, the sum of elements in any row or any column in <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M223','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M223">View MathML</a> or <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M224','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M224">View MathML</a> is not necessarily zero. We also note that unlike the case of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M225','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M225">View MathML</a>, using <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M226','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M226">View MathML</a> instead of C keeps the spectrum formulas in (58) correct but does not preserve the mathematical relations between the different <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M227','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M227">View MathML</a>-mapped spectra and the Voss-based spectrum.

Joint optimization of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M228','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M228">View MathML</a> and <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M229','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M229">View MathML</a>

It should be clear at this point that better DNA harmonics detection performance can be potentially achieved through a joint “optimization” of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M230','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M230">View MathML</a> and <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M231','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M231">View MathML</a>. For example, a learning paradigm can be used with a least-mean-square (LMS) criterion to find the optimal set, <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M232','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M232">View MathML</a> and <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M233','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M233">View MathML</a>. Alternatively, a biologically induced criterion can yield a substantial boost in performance but it is not clear which criterion to use. This interesting but challenging research topic is however outside the scope of this article and will not be further pursued here.

Example

Standard discrete time transforms have been proposed to replace the ST-DFT in the periodicity detection problem. In particular, the short time discrete cosine transform (ST-DCT), sine transform (ST-DST), and Hartley transform (ST-DHT) were introduced and analyzed for this purpose [33]. In this example, we show that these three transforms fit naturally within our proposed analysis when the two arrays <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M234','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M234">View MathML</a> and <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M235','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M235">View MathML</a> are adjusted correctly for each case. Although these standard transforms are not optimized for certain data sets, they can serve as preliminary tests for better periodicity detection. In [33], the short time DFT, DCT, DST, and DHT at k=M/R where shown to be given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M236','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M236">View MathML</a>

(37)

where t∈{fcsh} indicates Fourier, cosine, sine, and Hartley transforms, respectively, <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M237','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M237">View MathML</a> are possibly complex coefficients, and h(t)(m)=(αt)m. Values of the parameters α, a, b, and θrfor every transform are adjusted according to Table 4. For illustration, setting α=1, a=1, b=0, and θr=−2Πr/R in (59) results in the ST-DFT case. An efficient implementation to calculate Equation (59) is shown in Figure 15 which generalizes Figure 3.

thumbnailFigure 15. A general multirate DSP structure to compute the short-time DFT, DCT, DST, and DHT.

Table 4. Parameter settings in Figure 15 to compute the short time Fourier, cosine, sine, and Hartley transforms

This model provides a general framework that encapsulates the computation of the short-time Fourier, cosine, sine, and Hartley transforms at frequency point k=M/R. Therefore, the same matrix-based analysis of Sections 2 and 3 can be used. Matrix W will be updated into

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M240','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M240">View MathML</a>

and therefore the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M241','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M241">View MathML</a>-based DNA spectrum <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M242','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M242">View MathML</a> when one of the ST- DFT, DCT, DST, or DHT is employed can be stated as

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M243','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M243">View MathML</a>

(38)

where the values of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M244','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M244">View MathML</a> and <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M245','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M245">View MathML</a> are adjusted according to Table 5.

Table 5. Modified arrays<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M246','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M246">View MathML</a>and<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M247','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M247">View MathML</a>to compute the short time Fourier-, cosine-, sine-, and Hartley-based DNA spectrum of (60)

Note that similar to the Fourier case, the sum of elements in <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M256','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M256">View MathML</a> for the cosine and Hartley transforms cases is equal to zero. Therefore, under these two cases, the relations between different <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M257','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M257">View MathML</a>-based DNA spectra and the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M258','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M258">View MathML</a>-based DNA spectrum are still the same as given in Section 3.

At this point, it can be concluded that the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M259','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M259">View MathML</a>-based DNA spectrum <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M260','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M260">View MathML</a> is completely defined in terms of the Voss-based array of interleaved windows <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M261','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M261">View MathML</a>, the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M262','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M262">View MathML</a> mapping matrix A, the real array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M263','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M263">View MathML</a>, and the generally complex array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M264','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M264">View MathML</a>. This conclusion is summarized in Figure 16.

thumbnailFigure 16. A DSP structure to compute the modified <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M267','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M267">View MathML</a>-based DNA spectrum <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M268','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M268">View MathML</a>. The Voss-based array of interleaved windows <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M269','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M269">View MathML</a>, the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M270','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M270">View MathML</a> mapping matrix A, the real array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M271','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M271">View MathML</a>, and the generally complex array <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M272','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M272">View MathML</a> are the system design parameters.

A real approach for the spectrum computation

A real computationally-efficient alternative for the evaluation of Sd(n) can be found by observing the special properties of the circulant/toeplitz matrix D or equivalently the block matrix W. We use the fact that for a generally-complex matrix Q: yHQy=0, <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M273','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M273">View MathML</a>, if Q is an antisymmetric matrix. We start by splitting Dinto its symmetric and antisymmetric parts

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M274','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M274">View MathML</a>

where Ds is a circulant and Toeplitz real R×Rmatrix given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M275','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M275">View MathML</a>

and Das is a circulant and Toeplitz complex R×Rmatrix given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M276','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M276">View MathML</a>

Substituting for D in (15), we get a simple form of the spectrum S(n)

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M277','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M277">View MathML</a>

(39)

where <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M278','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M278">View MathML</a>, <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M279','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M279">View MathML</a>. The block matrix

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M280','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M280">View MathML</a>

is an R×R matrix of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M281','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M281">View MathML</a> blocks. Using (63) to update the DNA spectrum (21), Sv(n) simplifies into

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M282','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M282">View MathML</a>

Following the same analysis of Section 3, (64) can be easily manipulated into a more elegant completely real form given by

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M283','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M283">View MathML</a>

(40)

or more generally, (35) can be updated into

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M284','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M284">View MathML</a>

(41)

which provides a completely real approach for the computation of the <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M285','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M285">View MathML</a>-mapped spectrum Sd(n). Note that all results and different spectra relations in Section 3 still hold when Ws replaces W as in (65).

Computational complexity comparison

To quantify the computational credit of this real approach, we compare the complexity of (63) to that of (16) of a single discrete time sequence. Since <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M286','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M286">View MathML</a> can be complex as well according to the mapping used, we find the number of real multiplications and additions needed to evaluate (63) when each of <a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M287','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M287">View MathML</a> and W is either real or complex, as given in Table 6. Recall that the multiplication of the complex numbers x and y, where x=a + jb and y=c + jd requires the computation of acbdand ad + bc, which requires four real multiplications and two real additions.

Table 6. Real multiplications and additions needed for the evaluation of (63) and (16)

Example

For illustration, we evaluate the spectrum Sv(n) using Wswhen R=3, and compare the result to the formula derived in [37]. In specific, we use (64) to find the spectrum S(n) as follows

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M289','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M289">View MathML</a>

Expanding and completing the square, it follows that

<a onClick="popup('http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M290','MathML',630,470);return false;" target="_blank" href="http://bsb.eurasipjournals.com/content/2012/1/16/mathml/M290">View MathML</a>

(42)

where q=(r + 1) mod 3. The matrix-based DNA spectrum formula in (67) is consistent with the result derived using a different approach in [37].

Concluding remarks

In this article, we have introduced a matrix based framework for locating hidden DNA periodicities using spectral analysis techniques that are invariant to the choice of the symbolic to numerical map. The primary advantage of the presented approach over some of the previous study is the decomposition of the spectrum expression into key matrices whose values can be set independently from each other. Each matrix represents one of the essential components involved in the computation of the spectrum such as the symbolic to numerical map, the data transform, and the filtering scheme. The above framework is derived under the assumption that the symbolic to numerical map can be obtained from the Voss representation using an affine transformation. This assumption is however quite loose given that most (if not all) of the proposed maps in the literature satisfy this requisite. Using the new framework, we have then shown that the DNA spectrum expression is invariant under these maps. We have also derived a necessary and sufficient condition for the invariance of the DNA spectrum in terms of the affine transformation matrix A (the b vector in the affine transformation does not affect the DNA spectrum).

This condition can serve as the basis for generating novel symbolic to numerical map that preserve the DNA spectrum expression. Finally, in the latter sections of the article, we have shown the potential of using different filtering schemes, e.g., windows other than the rectangular one as well as alternate fast data transforms, e.g., the DCT, DST, and the Hartley transform. A number of simulation results that verify the findings of this article and a brief quantitative analysis of the computational complexity of the new approach were given in the same sections. Future research study would consider the optimization of the different building blocks, namely the symbolic to numerical map, the data transform, and the filtering scheme. This, in turn, requires a deep understanding of the biological significance of different DNA periodicities in order to set up a meaningful objective function and appropriate constraints. Ultimately, the framework proposed here can be incorporated in a more sophisticated system to study the complex structure of genomic sequences and understand the functionality of its various components. Finally, this efficient framework can be extended to the analysis of other types of symbolic sequences of various limited alphabets, either biological sequences (such as protein sequences) or even non-biological ones.

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

TS acknowledges partial support from the NSF via grants DMS 0811169 and DMS-1042939.

References

  1. G Benson, Tandem repeat finder: a program to analyze DNA sequences. Nucleic Acids Res 27(2), 573–580 (1999). PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  2. J Butler, Forensic DNA Typing: Biology and Technology behind STR Markers (Academic Press, MA, Burlington, 2003)

  3. CA Cummings, DA Relman, Microbial forensics: cross-examining pathogens. Science 296, 1976–1979 (2002). PubMed Abstract | Publisher Full Text OpenURL

  4. P Ramachandran, W Lu, A Antoniou, Filter-based methodology for the location of hot spots in proteins and exons in DNA. IEEE Trans. Biomed. Eng 59(6), 1598–1609 (2012). PubMed Abstract | Publisher Full Text OpenURL

  5. T Strachan, AP Read, Human Molecular Genetics (John Wiley and Sons, New York, 1999) (http://www, 1999), . ncbi.nlm.nih.gov/books/NBK7580/ webcite OpenURL

  6. AF Smit, The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev 6, 743–748 (1996). PubMed Abstract | Publisher Full Text OpenURL

  7. DC Rubinsztein, MR Hayden, Analysis of TRIPLET REPEAT Disorders (Bios Scientific Pub, Oxford, England, 1999)

  8. R Gupta, A Mittal, S Gupta, An efficient algorithm to detect palindromes in DNA sequences using periodicity transform. Signal Process 18(4), 8–20 (2001). Publisher Full Text OpenURL

  9. VR Chechetkin, AY Turygin, Size-dependence of three-periodicity and long-range correlations in DNA sequences. Phys. Lett A 199, 75–80 (1995). Publisher Full Text OpenURL

  10. VR Chechetkin, AY Turygin, Search of hidden periodicities in DNA sequences. J. Theor. Biol 175, 477–494 (1995). PubMed Abstract | Publisher Full Text OpenURL

  11. BD Silverman, R Linsker, A measure of DNA periodicity. J. Theor. Biol 118(3), 295–300 (1986). PubMed Abstract | Publisher Full Text OpenURL

  12. D Holste, I Grosse, S Beirer, P Schieg, H Herzel, Repeats and correlations in human DNA sequences. Physic. Rev. E 67(06913) (2003)

  13. D Anastassiou, Genomicsignalprocessing Process, IEEE Signal Mag 18(4), 8–20 (2001). Publisher Full Text OpenURL

  14. VR Chechetkin, VV Lobzin, Anticodons, frameshifts, and hidden periodicities in tRNA sequences. J. Biomol. Struct. Dyn 24(2), 189–202 (2006). PubMed Abstract | Publisher Full Text OpenURL

  15. D Anastassiou, Frequency domain analysis of biomolecular sequences. Bioinformatics 16(12), 1073–1082 (2000). PubMed Abstract | Publisher Full Text OpenURL

  16. S Tiwari, S Ramachandran, A Bhattacharya, S Bhattacharya, R Ramaswamy, Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Biosci 13(3), 263–270 (1997). PubMed Abstract OpenURL

  17. M Akhtar, E Ambikairajah, Time and frequency domain methods for gene and exon prediction in Eukaryotes. Proceedings of ICASSP (International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2007, Honolulu, Hawaii, USA, 2007), pp. 573–576

  18. RF Voss, Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett 68(25), 3805–3808 (1992). PubMed Abstract | Publisher Full Text OpenURL

  19. JW Fickett, The gene identification problem: an overview for developers. Comput. Chem 20, 103–118 (1996). PubMed Abstract | Publisher Full Text OpenURL

  20. N Bouaynaya, D Schonfeld, Non-stationary analysis of coding and non-coding regions in nucleotide sequences. IEEE J. Sel. Top. Signal Process 2(3), 357–364 (2008)

  21. PP Vaidyanathan, BJ Yoon, Gene and exon prediction using all pass-based filters. Gensips Proc ((Workshop on Genomic Signal Processing and Statistics (GENSIPS), Raleigh, North Carolina, USA, 2003)), pp. 1–4

  22. PP Vaidyanathan, B Yoon, Digital filter for gene prediction applications. in Proc, ed. by . Asilomar conference (Asilomar Conference on Signals, Systems and Computers (ACSSC), Pacific Grove, CA, USA, 2003), pp. 306–310

  23. M Akhtar, J Epps, E Ambikairajah, On DNA numerical representations for period-3 based exon prediction. Proceedings of the workshop on Genomic Signal Processing and Statistics (International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2007, Tuusula, Finland, 2007), pp. 1–4

  24. A Rushdi, J Tuqan, Gene identification using the Z-curve representation. Proceedings of the 31st IEEE ICASSP conference (International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2006, Toulouse, France, 2006), pp. 1024–1027

  25. J Tuqan, A Rushdi, A DSP approach for finding the codon bias in DNA sequences. IEEE J. Sel. Top. Signal Process 2(3), 343–356 (2008)

  26. A Rushdi, J Tuqan, An efficient algorithm for DNA discrete Fourier analysis. Proceedings of the 3rd IEEE Cairo International Biomedical Engineering Conference (CIBEC) (Cairo International Biomedical Engineering Conference, Cairo, Egypt, 2006), pp. 1–4

  27. L Wang, D Schonfeld, Mapping equivalence for symbolic sequences: Theory and applications. IEEE Trans. Signal Process 57(12), 4895–4905 (2009)

  28. A Rushdi, J Tuqan, The role of the Symbolic-to-Numerical Mapping in the detection of DNA Periodicities. Proceedings of the workshop on Genomic Signal Processing and Statistics (International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2008, Phoenix, AZ, USA, 2008), pp. 1–4

  29. PD Cristea, Conversion of nucleotides sequences into genomic signals. J. Cellul. Mol. Med 6(2), 279–303 (2002). Publisher Full Text OpenURL

  30. AK Brodzik, O Peters, Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences. in Proceedings of the IEEE Int, ed. by . Conf. on Acoustics, Speech, and Signal Processing (International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2005, Philadelphia, PA, USA, 2005), pp. 373–376

  31. E Coward, Equivalence of two Fourier methods for biological sequences. J. Math. Biol 36, 64–70 (1997). Publisher Full Text OpenURL

  32. M Burset, R Guigo, Evaluation of gene structure prediction programs. Genomics 34, 353–357 (1996). PubMed Abstract | Publisher Full Text OpenURL

  33. A Rushdi, J Tuqan, Trigonometric Transforms for Finding Repeats in DNA sequences. Proceedings of the workshop on Genomic Signal Processing and Statistics (International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2008, Phoenix, AZ, USA, 2008), pp. 1–4

  34. JA Berger, SK Mitra, J Astola, Power spectrum analysis for DNA sequences. in Proc, ed. by . of the Int. Sym. on Signal Processing and its App (Conference: International Symposium on Signal Processing and Its Applications (ISSPA) 2003, Paris, France, 2003), pp. 29–32

  35. D Kotlar, Y Lavner, Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 13(8), 1930–1937 (2003). PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  36. A Rushdi, J Tuqan, The filtered spectral rotation measure. Proceedings of the 40th IEEE Asilomar Conference on Signals, Systems, and Computers (Asilomar Conference on Signals, Systems and Computers (ACSSC) 2006, CA, USA, 2006), pp. 1875–1879

  37. S Datta, A Asif, A fast DFT based gene prediction algorithm for identification of protein coding regions. in Proc, ed. by . of the ICASSP (International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2005, Philadelphia, PA, USA, 2005), pp. 113–116