Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction.
Questions tagged [information-extraction]
336 questions
133
votes
6 answers
How does Apple find dates, times and addresses in emails?
In the iOS email client, when an email contains a date, time or location, the text becomes a hyperlink and it is possible to create an appointment or look at a map simply by tapping the link. It not only works for emails in English, but in other…

Martin
- 39,309
- 62
- 192
- 278
84
votes
2 answers
PDF Parsing Using Python - extracting formatted and plain texts
I'm looking for a PDF library which will allow me to extract the text from a PDF document. I've looked at PyPDF, and this can extract the text from a PDF document very nicely. The problem with this is that if there are tables in the document, the…

Mike Cialowicz
- 9,892
- 9
- 47
- 76
67
votes
2 answers
What is CoNLL data format?
I am using a open source jar (Mate Parser) which outputs in the CoNLL 2009 format after dependency parsing. I want to use the dependency parsing results for Information Extraction, however, I only understand part of the output in the CoNLL data…

swapna sourav rout
- 817
- 1
- 6
- 8
23
votes
4 answers
Choose or generate canonical variant from multiple sentences
I'm working with an API that maps my GTIN/EAN queries to product data.
Since the data returned originates from merchant product feeds, the following is almost universally the case:
Multiple results per GTIN
Products' titles are pretty much…

vzwick
- 11,008
- 5
- 43
- 63
18
votes
4 answers
Media Information Extractor for Java
I need a media information extraction library (pure Java or JNI wrapper) that can handle common media formats. I primarily use it for video files and I need at least these information:
Video length (Runtime)
Video bitrate
Video framerate
Video…

Emre Yazici
- 10,136
- 6
- 48
- 55
14
votes
4 answers
Medical information extraction using Python
I am a nurse and I know python but I am not an expert, just used it to process DNA sequences
We got hospital records written in human languages and I am supposed to insert these data into a database or csv file but they are more than 5000 lines and…

Nurse
- 143
- 1
- 5
13
votes
2 answers
Example python script that uses DBPedia?
I am writing a python script to extract "Entity names" from a collection of thousands of news articles from a few countries and languages.
I would like to make use of the amazing DBPedia structured knwoledge, say for example to look up the names of…

jaz
- 175
- 1
- 1
- 7
12
votes
2 answers
NLP for extracting actions from text
I'm hoping somebody can point me in the right direction to learn about separating out actions from a bunch of text.
Suppose I have this text
Drop off the dry cleaning, and go to the corner store and pick-up a jug of milk and get a pint of…

pedalpete
- 21,076
- 45
- 128
- 239
12
votes
1 answer
How to encode dependency path as a feature for classification?
I am trying to implement relation extraction between verb pairs. I want to use dependency path from one verb to the other as a feature for my classifier (predicts if relation X exists or not). But I am not sure how to encode the dependency path as a…

Syed Fahad Sultan
- 508
- 7
- 19
11
votes
2 answers
What is the difference between Information Extraction and Text Mining?
It may be looking easy. But I am confused.
What is the difference between Text Mining and Information Extraction ?
user1599171
10
votes
3 answers
Methods for extracting locations from text?
What are the recommended methods for extracting locations from free text?
What I can think of is to use regex rules like "words ... in location". But are there better approaches than this?
Also I can think of having a lookup hash table table with…

Jack Twain
- 6,273
- 15
- 67
- 107
9
votes
2 answers
Which phrase extraction tool is the state of art now?
I know of the following open source tools, but I haven't found any comparisons of how good they are respectively.
Tools with ready to use phrase extraction:
KEA
MAUI (http://code.google.com/p/maui-indexer/)
Dragon, xTract…

yura
- 14,489
- 21
- 77
- 126
9
votes
1 answer
NLP - information extraction in Python (spaCy)
I am attempting to extract this type of information from the following paragraph structure:
women_ran men_ran kids_ran walked
1 2 1 3
2 4 3 1
3 6 5 2
text = ["On…

kathystehl
- 831
- 1
- 9
- 26
9
votes
2 answers
Using Conditional Random Fields for Named Entity Recognition
What is Conditional Random Field?
How does exactly Conditional Random Field identify proper names as person, organization, or place in a structured or unstructured text?
For example: This product is ordered by StackOverFlow Inc. …

user239135
- 141
- 1
- 3
9
votes
5 answers
NLP to find relationship between entities
My current understanding is that it's possible to extract entities from a text document using toolkits such as OpenNLP, Stanford NLP.
However, is there a way to find relationships between these entities?
For example consider the following text :…

Soumya Simanta
- 11,523
- 24
- 106
- 161