Open Access Research Article

Multipattern Consensus Regions in Multiple Aligned Protein Sequences and Their Segmentation

David KY Chiu* and Yan Wang

  • * Corresponding author: David KY Chiu

Author Affiliations

Department of Computing and Information Science, University of Guelph, Guelph, ON, Canada, N1G 2W1

For all author emails, please log on.

EURASIP Journal on Bioinformatics and Systems Biology 2006, 2006:35809 doi:10.1155/BSB/2006/35809


The electronic version of this article is the complete one and can be found online at: http://bsb.eurasipjournals.com/content/2006/1/35809


Received:23 November 2005
Revisions received:22 May 2005
Accepted:7 June 2006
Published:13 August 2006

© 2006 D. K. Y. Chiu and Y. Wang.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Decomposing a biological sequence into its functional regions is an important prerequisite to understand the molecule. Using the multiple alignments of the sequences, we evaluate a segmentation based on the type of statistical variation pattern from each of the aligned sites. To describe such a more general pattern, we introduce multipattern consensus regions as segmented regions based on conserved as well as interdependent patterns. Thus the proposed consensus region considers patterns that are statistically significant and extends a local neighborhood. To show its relevance in protein sequence analysis, a cancer suppressor gene called p53 is examined. The results show significant associations between the detected regions and tendency of mutations, location on the 3D structure, and cancer hereditable factors that can be inferred from human twin studies.

Research Article

References

  1. DKY Chiu, T Kolodziejczak, Inferring consensus structure from nucleic acid sequences. Computer Applications in the Biosciences 7(3), 347–352 (1991). PubMed Abstract OpenURL

  2. DKY Chiu, G Harauz, A method for inferring probabilistic consensus structure with applications to molecular sequence data. Pattern Recognition 26(4), 643–654 (1993). Publisher Full Text OpenURL

  3. DKY Chiu, TWH Lui, Integrated use of multiple interdependent patterns for biomolecular sequence analysis. International Journal of Fuzzy Systems 4(3), 766–775 (2002)

  4. DKY Chiu, AKC Wong, Multiple pattern associations for interpreting structural and functional characteristics of biomolecules. Information Sciences 167(1–4), 23–39 (2004)

  5. DKY Chiu, TWH Lui, A multiple-pattern biosequence analysis method for diverse source association mining. Applied Bioinformatics 4(2), 85–92 (2005). PubMed Abstract | Publisher Full Text OpenURL

  6. MS Greenblatt, WP Bennett, M Hollstein, CC Harris, Mutations in the p53 tumor suppressor gene: clues to cancer etiology and molecular pathogenesis. Cancer Research 54(18), 4855–4878 (1994). PubMed Abstract | Publisher Full Text OpenURL

  7. RJ Boys, DA Henderson, A Bayesian approach to DNA sequence segmentation. Biometrics 60, 573–588 (2004). PubMed Abstract | Publisher Full Text OpenURL

  8. W Li, P Bernaola-Galván, F Haghighi, I Grosse, Applications of recursive segmentation to the analysis of DNA sequences. Computers and Chemistry 26(5), 491–510 (2002). PubMed Abstract | Publisher Full Text OpenURL

  9. DKY Chiu, G Rao, The 2-level pattern analysis of genome comparisons. WSEAS Transactions on Biology and Biomedicine 3(3), 167–174 (2006)

  10. W Yan, in A segmentation algorithm for consensus regions in biosequences, M, ed. by . S. thesis (Department of Computing and Information Science, University of Guelph, Guelph, Ontario, Canada, 2003)

  11. J Zhang, Analysis of information content for biological sequences. Journal of Computational Biology 9(3), 487–503 (2002). PubMed Abstract | Publisher Full Text OpenURL

  12. P Lichtenstein, NV Holm, PK Verkasalo, et al. Environmental and heritable factors in the causation of cancer: analyses of cohorts of twins from Sweden, Denmark, and Finland. New England Journal of Medicine 343(2), 78–85 (2000). PubMed Abstract | Publisher Full Text OpenURL

  13. PKE Magnusson, P Sparen, UB Gyllensten, Genetic link to cervical tumours. Nature 400(6739), 29–30 (1999). PubMed Abstract | Publisher Full Text OpenURL

  14. AKC Wong, TS Liu, CC Wang, Statistical analysis of residue variability in cytochrome c. Journal of Molecular Biology 102(2), 287–295 (1976). PubMed Abstract | Publisher Full Text OpenURL

  15. CE Shannon, A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948) (reprinted in C, 1948), . E. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, Ill, USA, 1949 OpenURL

  16. LL Gatlin, The information content of DNA. Journal of Theoretical Biology 10(2), 281–300 (1966). PubMed Abstract | Publisher Full Text OpenURL

  17. AKC Wong, Y Wang, High-order pattern discovery from discrete-valued data. IEEE Transactions on Knowledge and Data Engineering 9(6), 877–893 (1997). Publisher Full Text OpenURL

  18. SJ Haberman, The analysis of residuals in cross-classified tables. Biometrics 29, 205–220 (1973). Publisher Full Text OpenURL

  19. JG Kalbfleisch, in Probability and Statistical Inference, Vol, ed. by . 2: Statistical Inference, 2nd edn. (Springer, New York, NY, USA, 1985)

  20. HM Berman, J Westbrook, Z Feng, et al. The protein data bank. Nucleic Acids Research 28(1), 235–242 (2000). PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  21. M Hollstein, D Sidransky, B Vogelstein, CC Harris, p53 mutations in human cancers. Science 253(5015), 49–53 (1991). PubMed Abstract | Publisher Full Text OpenURL

  22. AJ Levine, J Momand, CA Finlay, The p53 tumour suppressor gene. Nature 351(6326), 453–456 (1991). PubMed Abstract | Publisher Full Text OpenURL

  23. AJ Levine, p53, the cellular gatekeeper for growth and division. Cell 88(3), 323–331 (1997). PubMed Abstract | Publisher Full Text OpenURL

  24. B Boeckmann, A Bairoch, R Apweiler, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365–370 (2003). PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  25. Y Cho, S Gorina, PD Jeffrey, NP Pavletich, Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Science 265(5170), 346–355 (1994). PubMed Abstract | Publisher Full Text OpenURL

  26. D Hamroun, S Kato, C Ishioka, M Claustres, C Beroud, T Soussi, The UMD TP53 database and website: update and revisions. Human Mutation 27(1), 14–20 (2005)

  27. DKY Chiu, X Chen, AKC Wong, Association between statistical and functional patterns in biomolecules. Proceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems and Technolgoy (CBGIST '01), Durham, NC, USA March 2001, 64–69