Questions tagged [spacy]

Industrial strength Natural Language Processing (NLP) with Python and Cython

spaCy is a library for advanced Natural Language Processing in Python and Cython. Its features include tokenization, part-of-speech tagging, dependency parsing, sentence boundary detection, named entity recognition and training of statistical neural network models.


Resources

3742 questions
1
vote
2 answers

Sentence tokenization w/o relying on punctuations and capitalizations

Is there a possible approach for extracting sentences from paragraphs / sentence tokenization for paragraphs that doesn't have any punctuations and/or all lowercased? We have a specific need for being able to split paragraphs into sentences while…
ZZZZZZZZZ
  • 197
  • 2
  • 10
1
vote
1 answer

How do I correlate the `int` hash value of a spaCy Token `ent_type` to a string?

Is there some way to decode the hash of spaCy token entity types without accessing the ent_type_ attribute? For example: for token in doc: hash_val = token.ent_type string_repr = some_map[hash_val]
FountainTree
  • 316
  • 1
  • 11
1
vote
1 answer

How to export SpaCy model with multiple components

I'm trying to build a SpaCy pipeline using multiple components. My current pipeline only has two components at the moment, one entity ruler, and another custom component. The way I build it is like this: class EntityLookupComponent: def…
user17730779
1
vote
1 answer

corpus to text with nltk?

Hello I download a corpus using NLTK phrase = nltk.corpus.conll2002.iob_sents('esp.testb')[0] That return: [('La', 'DA', 'B-LOC'), ('Coruña', 'NC', 'I-LOC'), (',', 'Fc', 'O'), ('23', 'Z', 'O'), ('may', 'NC', 'O'), ('(', 'Fpa', 'O'), …
Tlaloc-ES
  • 4,825
  • 7
  • 38
  • 84
1
vote
0 answers

NLTK regexp tokenization to spaCy conversion not working

We have code, written in NLTK, but we need it in spaCy, because we are trying to improve the speed with help of CUDA cores. Here is the NLTK code: import nltk import re import time snt1 = u'''"{'body': 'So as of today's close (8/15/14), I'm…
PrimozS
  • 11
  • 2
1
vote
0 answers

How can i use thinc.types with spacy version 2

I am using spacy version==2.2.4 for name entity recognition and wishes to use the same version for testing custom spacy relation extraction pipeline. But unfortunately, I am facing the below issue while running custom relation extraction model with…
1
vote
1 answer

How can I write a rule in spacy with pattern inside a pattern (nested pattern)

I would like to write a pattern in Spacy that matches a text and then optionally all the parenthesis at the right of the text if there is any. For example, for the following texts, I give as input the left part of the arrow "->" and expect as an…
1
vote
1 answer

Most efficient way to find a 3rd Person singular pronoun, a noun and a verb in a sentence in spacy

i have this code (I'm a newbie in Python) and it works fine, but i think my code is not efficient. It checks out if a sentence contains a Third-Person Singular Pronoun (he, she or it), a noun and a verb: def findNounVerbPronoun(): countElements…
Laz22434
  • 373
  • 1
  • 12
1
vote
0 answers

Word embeddings for phrases

I have a dataset, which contains of approx. 50k records of job titles. For example: Python Developer Java Developer Accountant Salesman Developer Programmer etc. (Job titles are in german, but for explanation it doesn't matter) I want to cluster…
Daniel Yefimov
  • 860
  • 1
  • 10
  • 24
1
vote
1 answer

Python - Is there a way to evaluate a named entity recognizer trained on IOB data using spacy?

I trained my named entity recognizer with spacy. I would like to evaluate it. So I looked at the spacy documentation and came across the scorer function. However, it doesn't seem to work with the IOB format. Do you think there will be a way to use…
Balkhrod
  • 81
  • 7
1
vote
1 answer

Merge name forms for same person found via NER with Spacy

I have a document of text and I want to find out which person the text is "most about", my approximation of "most about" will be defined as the person mentioned most. I use Spacy Named Entity Recognition (NER) to get a list of all NER, then filter…
MattG
  • 5,589
  • 5
  • 36
  • 52
1
vote
1 answer

Manually set sentence boundaries in Spacy

Suppose I know ahead of time the character-level sentence boundaries in a document: text = "The cat chased the mouse. The mouse ran away." boundaries = [(0, 25), (26, 45)] for start, end in boundaries: print(text[start:end]) Is there a way that…
user94154
  • 16,176
  • 20
  • 77
  • 116
1
vote
0 answers

Training Spacy custom entities from excel sheet

I'm new to the NLP and have come across one problem while training spacy format data. TRAIN_DATA = [ ("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}), ("I like London and Berlin.", {"entities": [(7, 13, "LOC"), (18, 24,…
1
vote
1 answer

Blank lemmatization using spacy

How to use lemmatization in Spacy? I try with this code but the output is blank. My spacy ver. 3.2.0 from spacy.lang.id import Indonesian nlp = Indonesian() def tokenizer(text): return [token.lemma_.lower() for token in nlp(text) if not…
caeruleum
  • 459
  • 1
  • 3
  • 16
1
vote
1 answer

Converting Spacy NER entity format to CONLL 2003 format

I am working on NER application where i have data annotated in the following data format. [('The F15 aircraft uses a lot of fuel', {'entities': [(4, 7, 'aircraft')]}), ('did you see the F16 landing?', {'entities': [(16, 19, 'aircraft')]}), ('how…
imhans33
  • 133
  • 11