Information extraction

Here are some papers on Information Extraction

(a) Information Extraction, Andrew McCallum, University of Massachusettes, ACM Queue 2005 A non-technical overview of information extraction that presents a 5-step high-level overview consisting of 1. Segmentation: essentially tokenization of text 2. Classification: classify each segmented piece as one of several classes (Person, Organization etc.) 3. Association: essentially relationship detection 4. Normalization: different things are normalized to be the same (3-3:30 and 1500-1530 to possibly the same ISO std). 5. Deduplication: essentially coreference resolution etc.

Talks about the different higher-level approaches to Information Extraction and applicability of these 1. Simple regular expressions: for simple extraction tasks 2. Rules: more complex exraction tasks but semantics are still clearly defineable 3. Machine learning algorithms: Subtle rules for very complex tasks

Uncertainty is an integral part of information extraction and needs to be managed appropriately