<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1687-4153-2007-72936</ui>
   <ji>1687-4153</ji>
   <fm>
      <dochead>Research Article</dochead>
      <bibl>
         <title>
            <p>Aligning Sequences by Minimum Description Length</p>
         </title>
         <aug>
            <au ca="yes" id="A1"><snm>Conery</snm><fnm>JohnS</fnm><insr iid="I1"/><email>conery@cs.uoregon.edu</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>Department of Computer and Information Science, University of Oregon, Eugene, OR 97403, USA</p></ins>
         </insg>
         <source>EURASIP Journal on Bioinformatics and Systems Biology</source>
         <issn>1687-4153</issn>
         <pubdate>2007</pubdate>
         <volume>2007</volume>
         <issue>1</issue>
         <fpage>72936</fpage>
         <url>http://bsb.eurasipjournals.com/content/2007/1/72936</url>
         <xrefbib><pubid idtype="doi">10.1155/2007/72936</pubid></xrefbib>
      </bibl>
      <history><rec><date><day>26</day><month>2</month><year>2007</year></date></rec><revrec><date><day>6</day><month>8</month><year>2007</year></date></revrec><acc><date><day>16</day><month>11</month><year>2007</year></date></acc><pub><date><day>2</day><month>1</month><year>2008</year></date></pub></history>
      <cpyrt><year>2007</year><collab>John S. Conery.</collab><note>This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <abs>
         <sec>
            <st>
               <p/>
            </st>
            <p>This paper presents a new information theoretic framework for aligning sequences in bioinformatics. A transmitter compresses a set of sequences by constructing a regular expression that describes the regions of similarity in the sequences. To retrieve the original set of sequences, a receiver generates all strings that match the expression. An alignment algorithm uses minimum description length to encode and explore alternative expressions; the expression with the shortest encoding provides the best overall alignment. When two substrings contain letters that are similar according to a substitution matrix, a code length function based on conditional probabilities defined by the matrix will encode the substrings with fewer bits. In one experiment, alignments produced with this new method were found to be comparable to alignments from <inline-formula><graphic file="1687-4153-2007-72936-i1.gif"/></inline-formula>. A second experiment measured the accuracy of the new method on pairwise alignments of sequences from the BAliBASE alignment benchmark.</p>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p/>
         </st>
         <p>[<abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr>]</p>
      </sec>
   </bdy>
   <bm>
      <refgrp><bibl id="B1"><title><p>The fragment assembly string graph</p></title><aug><au><snm>Myers</snm><fnm>EW</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><issue>suppl. 2</issue><fpage>ii79</fpage><lpage>ii85</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">16204131</pubid></xrefbib></bibl><bibl id="B2"><title><p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p></title><aug><au><snm>Altschul</snm><fnm>SF</fnm></au><au><snm>Madden</snm><fnm>TL</fnm></au><au><snm>Schaffer</snm><fnm>AA</fnm></au><etal/></aug><source>Nucleic Acids Research</source><pubdate>1997</pubdate><volume>25</volume><issue>17</issue><fpage>3389</fpage><lpage>3402</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/25.17.3389</pubid><pubid idtype="pmcid">146917</pubid><pubid idtype="pmpid" link="fulltext">9254694</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Homology assessment and molecular sequence alignment</p></title><aug><au><snm>Phillips</snm><fnm>AJ</fnm></au></aug><source>Journal of Biomedical Informatics</source><pubdate>2006</pubdate><volume>39</volume><issue>1</issue><fpage>18</fpage><lpage>33</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jbi.2005.11.005</pubid><pubid idtype="pmpid" link="fulltext">16380300</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Gaps in structurally similar proteins: towards improvement of multiple sequence alignment</p></title><aug><au><snm>Wrabl</snm><fnm>JO</fnm></au><au><snm>Grishin</snm><fnm>NV</fnm></au></aug><source>Proteins</source><pubdate>2004</pubdate><volume>54</volume><issue>1</issue><fpage>71</fpage><lpage>87</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">14705025</pubid></xrefbib></bibl><bibl id="B5"><title><p>Phylogenomic inference of protein molecular function: advances and challenges</p></title><aug><au><snm>Sj&#246;lander</snm><fnm>K</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><issue>2</issue><fpage>170</fpage><lpage>179</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth021</pubid><pubid idtype="pmpid" link="fulltext">14734307</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>BALSA: Bayesian algorithm for local sequence alignment</p></title><aug><au><snm>Webb</snm><fnm>B-JM</fnm></au><au><snm>Liu</snm><fnm>JS</fnm></au><au><snm>Lawrence</snm><fnm>CE</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2002</pubdate><volume>30</volume><issue>5</issue><fpage>1268</fpage><lpage>1277</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/30.5.1268</pubid><pubid idtype="pmcid">101229</pubid><pubid idtype="pmpid" link="fulltext">11861921</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Modelling by the shortest data description</p></title><aug><au><snm>Rissanen</snm><fnm>J</fnm></au></aug><source>Automatica</source><pubdate>1978</pubdate><volume>14</volume><issue>5</issue><fpage>465</fpage><lpage>471</lpage><xrefbib><pubid idtype="doi">10.1016/0005-1098(78)90005-5</pubid></xrefbib></bibl><bibl id="B8"><title><p>A minimum description length approach to grammar inference</p></title><aug><au><snm>Gr&#252;nwald</snm><fnm>P</fnm></au></aug><source>Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, Lecture Notes in Computer Science</source><publisher>Springer, Berlin, Germany</publisher><pubdate>1996</pubdate><volume>1040</volume><fpage>203</fpage><lpage>216</lpage></bibl><bibl id="B9"><title><p>Pattern discovery in biosequences</p></title><aug><au><snm>Brazma</snm><fnm>A</fnm></au><au><snm>Jonassen</snm><fnm>I</fnm></au><au><snm>Vilo</snm><fnm>J</fnm></au><au><snm>Ukkonen</snm><fnm>E</fnm></au></aug><source>International Conference on Grammar Inference (ICGI &apos;98), Lecture Notes in Artificial Intelligence</source><publisher>Springer, Ames, Iowa, USA</publisher><editor>Honavar V, Slutski G</editor><pubdate>1998</pubdate><volume>1433</volume><fpage>257</fpage><lpage>270</lpage></bibl><bibl id="B10"><title><p>Stochastic modeling of RNA pseudoknotted structures: a grammatical approach</p></title><aug><au><snm>Cai</snm><fnm>L</fnm></au><au><snm>Malmberg</snm><fnm>RL</fnm></au><au><snm>Wu</snm><fnm>Y</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><issue>suppl. 1</issue><fpage>i66</fpage><lpage>i73</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12855439</pubid></xrefbib></bibl><bibl id="B11"><title><p>The computational linguistics of biological sequences</p></title><aug><au><snm>Searls</snm><fnm>DB</fnm></au></aug><source>Artificial Intelligence and Molecular Biology, Menlo Park, Calif, USA</source><publisher>American Association for Artificial Intelligence</publisher><pubdate>1993</pubdate><fpage>47</fpage><lpage>120</lpage></bibl><bibl id="B12"><title><p>Linguistic approaches to biological sequences</p></title><aug><au><snm>Bsearls</snm><fnm>D</fnm></au></aug><source>Computer Applications in the Biosciences</source><pubdate>1997</pubdate><volume>13</volume><issue>4</issue><fpage>333</fpage><lpage>344</lpage><xrefbib><pubid idtype="pmpid">9283748</pubid></xrefbib></bibl><bibl id="B13"><title><p>PROSITE: a dictionary of sites and patterns in proteins</p></title><aug><au><snm>Bairoch</snm><fnm>A</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>1992</pubdate><volume>20</volume><fpage>2013</fpage><lpage>2018</lpage><xrefbib><pubidlist><pubid idtype="pmcid">333978</pubid><pubid idtype="pmpid" link="fulltext">1598232</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Sequence alignment and penalty choice. Review of concepts, case studies and implications</p></title><aug><au><snm>Vingron</snm><fnm>M</fnm></au><au><snm>Waterman</snm><fnm>MS</fnm></au></aug><source>Journal of Molecular Biology</source><pubdate>1994</pubdate><volume>235</volume><issue>1</issue><fpage>1</fpage><lpage>12</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0022-2836(05)80006-3</pubid><pubid idtype="pmpid" link="fulltext">8289235</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Scores for sequence searches and alignments</p></title><aug><au><snm>Henikoff</snm><fnm>S</fnm></au></aug><source>Current Opinion in Structural Biology</source><pubdate>1996</pubdate><volume>6</volume><issue>3</issue><fpage>353</fpage><lpage>360</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0959-440X(96)80055-8</pubid><pubid idtype="pmpid" link="fulltext">8804821</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>On gaps</p></title><aug><au><snm>Giribet</snm><fnm>G</fnm></au><au><snm>Wheeler</snm><fnm>WC</fnm></au></aug><source>Molecular Phylogenetics and Evolution</source><pubdate>1999</pubdate><volume>13</volume><issue>1</issue><fpage>132</fpage><lpage>143</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/mpev.1999.0643</pubid><pubid idtype="pmpid" link="fulltext">10508546</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Statistical evaluation and comparison of a pairwise alignment algorithm that a priori assigns the number of gaps rather than employing gap penalties</p></title><aug><au><snm>Nozaki</snm><fnm>Y</fnm></au><au><snm>Bellgard</snm><fnm>M</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><issue>8</issue><fpage>1421</fpage><lpage>1428</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti198</pubid><pubid idtype="pmpid" link="fulltext">15591359</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Empirical determination of effective gap penalties for sequence comparison</p></title><aug><au><snm>Reese</snm><fnm>JT</fnm></au><au><snm>Pearson</snm><fnm>WR</fnm></au></aug><source>Bioinformatics</source><pubdate>2002</pubdate><volume>18</volume><issue>11</issue><fpage>1500</fpage><lpage>1507</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/18.11.1500</pubid><pubid idtype="pmpid" link="fulltext">12424122</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Finite-state models in the alignment of macromolecules</p></title><aug><au><snm>Allison</snm><fnm>L</fnm></au><au><snm>Wallace</snm><fnm>CS</fnm></au><au><snm>Yee</snm><fnm>CN</fnm></au></aug><source>Journal of Molecular Evolution</source><pubdate>1992</pubdate><volume>35</volume><issue>1</issue><fpage>77</fpage><lpage>89</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/BF00160262</pubid><pubid idtype="pmpid">1518085</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>An information theoretic view of gapped and other alignments</p></title><aug><au><snm>Schmidt</snm><fnm>JP</fnm></au></aug><source>Proceedings of the 3rd Pacific Symposium on Biocomputing (PSB &apos;98), Maui, Hawaii, USA, January 1998</source><fpage>561</fpage><lpage>572</lpage></bibl><bibl id="B21"><title><p>An information theoretic approach to macromolecular modeling: I. Sequence alignments</p></title><aug><au><snm>Aynechi</snm><fnm>T</fnm></au><au><snm>Kuntz</snm><fnm>ID</fnm></au></aug><source>Biophysical Journal</source><pubdate>2005</pubdate><volume>89</volume><issue>5</issue><fpage>2998</fpage><lpage>3007</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1529/biophysj.104.054072</pubid><pubid idtype="pmcid">1366797</pubid><pubid idtype="pmpid" link="fulltext">16254389</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment</p></title><aug><au><snm>Morgenstern</snm><fnm>B</fnm></au></aug><source>Bioinformatics</source><pubdate>1999</pubdate><volume>15</volume><issue>3</issue><fpage>211</fpage><lpage>218</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/15.3.211</pubid><pubid idtype="pmpid" link="fulltext">10222408</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Fast and sensitive multiple alignment of large genomic sequences</p></title><aug><au><snm>Brudno</snm><fnm>M</fnm></au><au><snm>Chapman</snm><fnm>M</fnm></au><au><snm>G&#246;ttgens</snm><fnm>B</fnm></au><au><snm>Batzoglou</snm><fnm>S</fnm></au><au><snm>Morgenstern</snm><fnm>B</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2003</pubdate><volume>4</volume><fpage>66</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-4-66</pubid><pubid idtype="pmcid">521198</pubid><pubid idtype="pmpid" link="fulltext">14693042</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Information content of individual genetic sequences</p></title><aug><au><snm>Schneider</snm><fnm>TD</fnm></au></aug><source>Journal of Theoretical Biology</source><pubdate>1997</pubdate><volume>189</volume><issue>4</issue><fpage>427</fpage><lpage>441</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/jtbi.1997.0540</pubid><pubid idtype="pmpid" link="fulltext">9446751</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Measuring the similarity of protein structures by means of the universal similarity metric</p></title><aug><au><snm>Krasnogor</snm><fnm>N</fnm></au><au><snm>Pelta</snm><fnm>DA</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><issue>7</issue><fpage>1015</fpage><lpage>1021</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth031</pubid><pubid idtype="pmpid" link="fulltext">14751983</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Realign: grammar-based sequence alignment</p></title><aug><au><snm>Conery</snm><fnm>JS</fnm></au></aug><note>University of Oregon, <url>http://teleost.cs.uoregon.edu/realign</url></note></bibl><bibl id="B27"><title><p>A model of evolutionary change in proteins</p></title><aug><au><snm>Dayhoff</snm><fnm>MO</fnm></au><au><snm>Schwartz</snm><fnm>RM</fnm></au><au><snm>Orcutt</snm><fnm>BC</fnm></au></aug><source>Atlas of Protein Sequence and Structure, Washington, DC, USA, 1978</source><volume>5</volume><issue>suppl. 3</issue><fpage>345</fpage><lpage>352</lpage></bibl><bibl id="B28"><aug><au><snm>Mount</snm><fnm>DW</fnm></au></aug><source>Bioinformatics: Sequence and Genome Analysis</source><publisher>Cold Spring Harbor Laboratory Press, New York, NY, USA</publisher><edition>2</edition><pubdate>2004</pubdate></bibl><bibl id="B29"><title><p>Amino acid substitution matrices from protein blocks</p></title><aug><au><snm>Henikoff</snm><fnm>S</fnm></au><au><snm>Henikoff</snm><fnm>JG</fnm></au></aug><source>Proceedings of the National Academy of Sciences of the United States of America</source><pubdate>1992</pubdate><volume>89</volume><issue>22</issue><fpage>10915</fpage><lpage>10919</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.89.22.10915</pubid><pubid idtype="pmcid">50453</pubid><pubid idtype="pmpid" link="fulltext">1438297</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Exhaustive matching of the entire protein sequence database</p></title><aug><au><snm>Gonnet</snm><fnm>GH</fnm></au><au><snm>Cohen</snm><fnm>MA</fnm></au><au><snm>Benner</snm><fnm>SA</fnm></au></aug><source>Science</source><pubdate>1992</pubdate><volume>256</volume><issue>5062</issue><fpage>1443</fpage><lpage>1445</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1604319</pubid><pubid idtype="pmpid" link="fulltext">1604319</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes</p></title><aug><au><snm>Karlin</snm><fnm>S</fnm></au><au><snm>Altschul</snm><fnm>SF</fnm></au></aug><source>Proceedings of the National Academy of Sciences of the United States of America</source><pubdate>1990</pubdate><volume>87</volume><issue>6</issue><fpage>2264</fpage><lpage>2268</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.87.6.2264</pubid><pubid idtype="pmcid">53667</pubid><pubid idtype="pmpid" link="fulltext">2315319</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Where did the BLOSUM62 alignment score matrix come from?</p></title><aug><au><snm>Eddy</snm><fnm>SR</fnm></au></aug><source>Nature Biotechnology</source><pubdate>2004</pubdate><volume>22</volume><issue>8</issue><fpage>1035</fpage><lpage>1036</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt0804-1035</pubid><pubid idtype="pmpid" link="fulltext">15286655</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>ApiDB: integrated resources for the apicomplexan bioinformatics resource center</p></title><aug><au><snm>Aurrecoechea</snm><fnm>C</fnm></au><au><snm>Heiges</snm><fnm>M</fnm></au><au><snm>Wang</snm><fnm>H</fnm></au><etal/></aug><source>Nucleic Acids Research</source><pubdate>2007</pubdate><volume>35</volume><fpage>D427</fpage><lpage>D430</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl880</pubid><pubid idtype="pmcid">1669770</pubid><pubid idtype="pmpid" link="fulltext">17098930</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</p></title><aug><au><snm>Thompson</snm><fnm>JD</fnm></au><au><snm>Higgins</snm><fnm>DG</fnm></au><au><snm>Gibson</snm><fnm>TJ</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>1994</pubdate><volume>22</volume><issue>22</issue><fpage>4673</fpage><lpage>4680</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/22.22.4673</pubid><pubid idtype="pmcid">308517</pubid><pubid idtype="pmpid" link="fulltext">7984417</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>A comprehensive comparison of multiple sequence alignment programs</p></title><aug><au><snm>Thompson</snm><fnm>JD</fnm></au><au><snm>Plewniak</snm><fnm>F</fnm></au><au><snm>Poch</snm><fnm>O</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>1999</pubdate><volume>27</volume><issue>13</issue><fpage>2682</fpage><lpage>2690</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/27.13.2682</pubid><pubid idtype="pmcid">148477</pubid><pubid idtype="pmpid" link="fulltext">10373585</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Speculations on the origins of Plasmodium vivax malaria</p></title><aug><au><snm>Carter</snm><fnm>R</fnm></au></aug><source>Trends in Parasitology</source><pubdate>2003</pubdate><volume>19</volume><issue>5</issue><fpage>214</fpage><lpage>219</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1471-4922(03)00070-9</pubid><pubid idtype="pmpid" link="fulltext">12763427</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Predicting reliable regions in protein sequence alignments</p></title><aug><au><snm>Cline</snm><fnm>M</fnm></au><au><snm>Hughey</snm><fnm>R</fnm></au><au><snm>Karplus</snm><fnm>K</fnm></au></aug><source>Bioinformatics</source><pubdate>2002</pubdate><volume>18</volume><issue>2</issue><fpage>306</fpage><lpage>314</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/18.2.306</pubid><pubid idtype="pmpid" link="fulltext">11847078</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Nucleotide substitutions and the evolution of duplicate genes</p></title><aug><au><snm>Conery</snm><fnm>JS</fnm></au><au><snm>Lynch</snm><fnm>M</fnm></au></aug><source>Proceedings of the 6th Pacific Symposium on Biocomputing (PSB &apos;01), Big Island of Hawaii, Hawaii, USA, January 2001</source><fpage>167</fpage><lpage>178</lpage></bibl><bibl id="B39"><title><p>Comparison of methods for searching protein sequence databases</p></title><aug><au><snm>Pearson</snm><fnm>WR</fnm></au></aug><source>Protein Science</source><pubdate>1995</pubdate><volume>4</volume><issue>6</issue><fpage>1145</fpage><lpage>1160</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2143149</pubid><pubid idtype="pmpid" link="fulltext">7549879</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>The PROSITE database</p></title><aug><au><snm>Hulo</snm><fnm>N</fnm></au><au><snm>Bairoch</snm><fnm>A</fnm></au><au><snm>Bulliard</snm><fnm>V</fnm></au><etal/></aug><source>Nucleic Acids Research</source><pubdate>2006</pubdate><volume>34</volume><fpage>D227</fpage><lpage>D230</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkj063</pubid><pubid idtype="pmcid">1347426</pubid><pubid idtype="pmpid" link="fulltext">16381852</pubid></pubidlist></xrefbib></bibl></refgrp>
   </bm>
</art>