Questions tagged [named-entity-recognition]

Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Most research on NER systems has been structured as taking an unannotated block of text, such as this one:

Jim bought 300 shares of Acme Corp. in 2006.

And producing an annotated block of text that highlights where the named entities are, such as this one:

<ENAMEX TYPE="PERSON">Jim</ENAMEX>bought<NUMEX TYPE="QUANTITY">300</NUMEX>shares of<ENAMEX TYPE="ORGANIZATION">Acme Corp.</ENAMEX> in <TIMEX TYPE="DATE">2006</TIMEX>.

In this example, the annotations are marked using XML ENAMEX elements, following the format developed for the Message Understanding Conference in the 1990s.

State-of-the-art NER systems for English produce near-human performance. For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%.

Source:http://en.wikipedia.org/wiki/Named-entity_recognition

1456 questions
6
votes
1 answer

ValueError: [E143] Labels for component 'tagger' not initialized

I've been following this tutorial to create a custom NER. However, I keep getting this error: ValueError: [E143] Labels for component 'tagger' not initialized. This can be fixed by calling add_label, or by providing a representative batch of…
Diana
  • 363
  • 2
  • 8
6
votes
1 answer

Is there any NER model that recognizes first and last names instead of just PERSON?

Given a set of strings like: "John Doe" "Doe John" "Albert Green" "Greenshpan David" ... I would like to run a NER model that will recognize the first name and last name. All English models I use (in Spacy, NLTK etc.) gives me PERSON…
SteveS
  • 3,789
  • 5
  • 30
  • 64
6
votes
2 answers

How to get a description for each Spacy NER entity?

I am using Spacy NER model to extract from a text, some named entities relevant to my problem, such us DATE, TIME, GPE among others. For example, I need to recognize the Time Zone in the following sentence: "Australian Central Time" With Spacy…
Emiliano Viotti
  • 1,619
  • 2
  • 16
  • 30
6
votes
1 answer

Do I need to do any text cleaning for Spacy NER?

I am new to NER and Spacy. Trying to figure out what, if any, text cleaning needs to be done. Seems like some examples I've found trim the leading and trailing whitespace and then muck with the start/stop indexes. I saw one example where the guy did…
SledgeHammer
  • 7,338
  • 6
  • 41
  • 86
6
votes
1 answer

How to recognize entities in text that is the output of optical character recognition (OCR)?

I am trying to do multi-class classification with textual data. Problem I am facing that I have unstructured textual data. I'll explain the problem with an example. consider this image for example: I want to extract and classify text information…
6
votes
2 answers

How to make spaCy case Insensitive

How can I make spaCy case insensitive when finding the entity name? Is there any code snippet that i should add or something because the questions could mention entities that are not in uppercase? def analyseQuestion(question): doc =…
yac
  • 63
  • 1
  • 5
6
votes
1 answer

Formatting training dataset for SpaCy NER

I want to train a blank model for NER with my own entities. To do this, I need to use a dataset, which is currently in .csv form and features entity tags in the following format (I'll provide one example row for each relevant column): Column:…
6
votes
5 answers

Stanford NER toolkit - lowercase entities recognition

I am a newbie to NLP and trying to figure out how a Named Entity Recognizer annotates named entities. I am experimenting with Stanford NER toolkit. When I use the NER on standard more formal datasets where all naming conventions are followed to…
Anu
  • 525
  • 1
  • 6
  • 18
6
votes
2 answers

Search for job titles in an article using Spacy or NLTK

I'm new to NLP and recently been playing with NTLK and Spacy. However, I could not find a way to search for job titles (ex: product manager, chief marketing officer, etc) in an article. Example, I have 1000 articles and I want to get all the…
user643132
  • 101
  • 1
  • 5
6
votes
1 answer

How to recognize Indian names via NER in OpenNLP?

I am using OpenNLP models for Name-entity recognition, but the problem is that it will only recognize US and UK based names (foreign names), so I need to recognize Indian names. How is it possible?
Sagar Patel
  • 4,993
  • 1
  • 8
  • 19
6
votes
1 answer

Result Difference in Stanford NER tagger NLTK (python) vs JAVA

I am using both python and java to run the Stanford NER tagger but I am seeing the difference in the results. For example, when I input the sentence "Involved in all aspects of data modeling using ERwin as the primary software for this.", JAVA…
aerin
  • 20,607
  • 28
  • 102
  • 140
6
votes
1 answer

Named Entity recognition with openNLP (default model)

Can anyone point out the algorithm(s) used by openNLP NameFinder module? The code is complex and only sparsely documented and playing with it as a black box (with the default model provided) gives me the impression that it is mostly heuristic. Here…
ScienceFriction
  • 1,538
  • 2
  • 18
  • 29
6
votes
5 answers

Disease named entity recognition

I have a bunch of text documents that describe diseases. Those documents are in most cases quite short and often only contain a single sentence. An example is given here: Primary pulmonary hypertension is a progressive disease in which widespread…
alex
  • 833
  • 4
  • 12
  • 21
6
votes
1 answer

Custom Feature Generation in OpenNLP Namefinder API

I am trying to use the Custom Feature generation of OpenNLP for Named Finder API. http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html I went through the documentation but I was not able to understand how to specify the different…
6
votes
1 answer

Semi-automatic annotation tool - How to find RDF Triplets

I'm developing a semi-automatic annotation tool for medical texts and I am completely lost in finding the RDF triplets for annotation. I am currently trying to use an NLP based approach. I have already looked into Stanford NER and OpenNLP and they…