Enhancing HMM-based biomedical named entity recognition by studying special phenomena.

Authors: Zhang J; Shen D; Zhou G; Su J; Tan CL

Abstract: The purpose of this research is to enhance an HMM-based named entity recognizer in the biomedical domain. First, we analyze the characteristics of biomedical named entities. Then, we propose a rich set of features, including orthographic, morphological, part-of-speech, and semantic trigger features. All these features are integrated via a Hidden Markov Model with back-off modeling. Furthermore, we propose a method for biomedical abbreviation recognition and two methods for cascaded named entity recognition. Evaluation on the GENIA V3.02 and V1.1 shows that our system achieves 66.5 and 62.5 F-measure, respectively, and outperforms the previous best published system by 8.1 F-measure on the same experimental setting. The major contribution of this paper lies in its rich feature set specially designed for biomedical domain and the effective methods for abbreviation and cascaded named entity recognition. To our best knowledge, our system is the first one that copes with the cascaded phenomena.

Keywords: Abbreviations as Topic; Abstracting and Indexing as Topic/*methods; Algorithms; Animals; Artificial Intelligence; Biology/methods; Computational Biology/*methods; Database Management Systems; Databases as Topic; Databases, Bibliographic; Humans; Information Storage and Retrieval/*methods; Language; Markov Chains; Models, Statistical; Names; Natural Language Processing; Software; Terminology as Topic
Journal: Journal of biomedical informatics
Volume: 37
Issue: 6
Pages: 411-22
Date: Nov. 16, 2004
PMID: 15542015
Select reference article to upload


Citation:

Zhang J, Shen D, Zhou G, Su J, Tan CL (2004) Enhancing HMM-based biomedical named entity recognition by studying special phenomena. Journal of biomedical informatics 37: 411-22.



Update (Admin) | Auto-Update

Comment on This Data Unit