Open Access Research Article

Extraction of Protein Interaction Data: A Comparative Analysis of Methods in Use

Hena Jose, Thangavel Vadivukarasi and Jyothi Devakumar*

Author Affiliations

Jubilant Biosys Ltd., #96, Industrial Suburb, 2nd Stage, Yeshwanthpur, Bangalore 560 022, India

For all author emails, please log on.

EURASIP Journal on Bioinformatics and Systems Biology 2007, 2007:53096 doi:10.1155/2007/53096

Published: 9 December 2007

Abstract

Several natural language processing tools, both commercial and freely available, are used to extract protein interactions from publications. Methods used by these tools include pattern matching to dynamic programming with individual recall and precision rates. A methodical survey of these tools, keeping in mind the minimum interaction information a researcher would need, in comparison to manual analysis has not been carried out. We compared data generated using some of the selected NLP tools with manually curated protein interaction data (PathArt and IMaps) to comparatively determine the recall and precision rate. The rates were found to be lower than the published scores when a normalized definition for interaction is considered. Each data point captured wrongly or not picked up by the tool was analyzed. Our evaluation brings forth critical failures of NLP tools and provides pointers for the development of an ideal NLP tool.