Questions tagged [information-extraction]

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction.

336 questions
133
votes
6 answers

How does Apple find dates, times and addresses in emails?

In the iOS email client, when an email contains a date, time or location, the text becomes a hyperlink and it is possible to create an appointment or look at a map simply by tapping the link. It not only works for emails in English, but in other…
Martin
  • 39,309
  • 62
  • 192
  • 278
84
votes
2 answers

PDF Parsing Using Python - extracting formatted and plain texts

I'm looking for a PDF library which will allow me to extract the text from a PDF document. I've looked at PyPDF, and this can extract the text from a PDF document very nicely. The problem with this is that if there are tables in the document, the…
Mike Cialowicz
  • 9,892
  • 9
  • 47
  • 76
67
votes
2 answers

What is CoNLL data format?

I am using a open source jar (Mate Parser) which outputs in the CoNLL 2009 format after dependency parsing. I want to use the dependency parsing results for Information Extraction, however, I only understand part of the output in the CoNLL data…
23
votes
4 answers

Choose or generate canonical variant from multiple sentences

I'm working with an API that maps my GTIN/EAN queries to product data. Since the data returned originates from merchant product feeds, the following is almost universally the case: Multiple results per GTIN Products' titles are pretty much…
vzwick
  • 11,008
  • 5
  • 43
  • 63
18
votes
4 answers

Media Information Extractor for Java

I need a media information extraction library (pure Java or JNI wrapper) that can handle common media formats. I primarily use it for video files and I need at least these information: Video length (Runtime) Video bitrate Video framerate Video…
Emre Yazici
  • 10,136
  • 6
  • 48
  • 55
14
votes
4 answers

Medical information extraction using Python

I am a nurse and I know python but I am not an expert, just used it to process DNA sequences We got hospital records written in human languages and I am supposed to insert these data into a database or csv file but they are more than 5000 lines and…
Nurse
  • 143
  • 1
  • 5
13
votes
2 answers

Example python script that uses DBPedia?

I am writing a python script to extract "Entity names" from a collection of thousands of news articles from a few countries and languages. I would like to make use of the amazing DBPedia structured knwoledge, say for example to look up the names of…
jaz
  • 175
  • 1
  • 1
  • 7
12
votes
2 answers

NLP for extracting actions from text

I'm hoping somebody can point me in the right direction to learn about separating out actions from a bunch of text. Suppose I have this text Drop off the dry cleaning, and go to the corner store and pick-up a jug of milk and get a pint of…
pedalpete
  • 21,076
  • 45
  • 128
  • 239
12
votes
1 answer

How to encode dependency path as a feature for classification?

I am trying to implement relation extraction between verb pairs. I want to use dependency path from one verb to the other as a feature for my classifier (predicts if relation X exists or not). But I am not sure how to encode the dependency path as a…
11
votes
2 answers

What is the difference between Information Extraction and Text Mining?

It may be looking easy. But I am confused. What is the difference between Text Mining and Information Extraction ?
user1599171
10
votes
3 answers

Methods for extracting locations from text?

What are the recommended methods for extracting locations from free text? What I can think of is to use regex rules like "words ... in location". But are there better approaches than this? Also I can think of having a lookup hash table table with…
9
votes
2 answers

Which phrase extraction tool is the state of art now?

I know of the following open source tools, but I haven't found any comparisons of how good they are respectively. Tools with ready to use phrase extraction: KEA MAUI (http://code.google.com/p/maui-indexer/) Dragon, xTract…
yura
  • 14,489
  • 21
  • 77
  • 126
9
votes
1 answer

NLP - information extraction in Python (spaCy)

I am attempting to extract this type of information from the following paragraph structure: women_ran men_ran kids_ran walked 1 2 1 3 2 4 3 1 3 6 5 2 text = ["On…
kathystehl
  • 831
  • 1
  • 9
  • 26
9
votes
2 answers

Using Conditional Random Fields for Named Entity Recognition

What is Conditional Random Field? How does exactly Conditional Random Field identify proper names as person, organization, or place in a structured or unstructured text? For example: This product is ordered by StackOverFlow Inc. …
9
votes
5 answers

NLP to find relationship between entities

My current understanding is that it's possible to extract entities from a text document using toolkits such as OpenNLP, Stanford NLP. However, is there a way to find relationships between these entities? For example consider the following text :…
Soumya Simanta
  • 11,523
  • 24
  • 106
  • 161
1
2 3
22 23