Questions tagged [spacy]

Industrial strength Natural Language Processing (NLP) with Python and Cython

spaCy is a library for advanced Natural Language Processing in Python and Cython. Its features include tokenization, part-of-speech tagging, dependency parsing, sentence boundary detection, named entity recognition and training of statistical neural network models.

Resources

3742 questions

vote

2 answers

Sentence tokenization w/o relying on punctuations and capitalizations

Is there a possible approach for extracting sentences from paragraphs / sentence tokenization for paragraphs that doesn't have any punctuations and/or all lowercased? We have a specific need for being able to split paragraphs into sentences while…

asked Dec 26 '21 at 13:23

ZZZZZZZZZ

vote

1 answer

How do I correlate the `int` hash value of a spaCy Token `ent_type` to a string?

Is there some way to decode the hash of spaCy token entity types without accessing the ent_type_ attribute? For example: for token in doc: hash_val = token.ent_type string_repr = some_map[hash_val]

python nlp spacy

asked Dec 21 '21 at 22:25

FountainTree

vote

1 answer

How to export SpaCy model with multiple components

I'm trying to build a SpaCy pipeline using multiple components. My current pipeline only has two components at the moment, one entity ruler, and another custom component. The way I build it is like this: class EntityLookupComponent: def…

python spacy spacy-3

asked Dec 21 '21 at 09:05

user17730779

vote

1 answer

corpus to text with nltk?

Hello I download a corpus using NLTK phrase = nltk.corpus.conll2002.iob_sents('esp.testb')[0] That return: [('La', 'DA', 'B-LOC'), ('Coruña', 'NC', 'I-LOC'), (',', 'Fc', 'O'), ('23', 'Z', 'O'), ('may', 'NC', 'O'), ('(', 'Fpa', 'O'), …

python nltk spacy

asked Dec 18 '21 at 20:56

Tlaloc-ES

4,825
7
38
84

vote

0 answers

NLTK regexp tokenization to spaCy conversion not working

We have code, written in NLTK, but we need it in spaCy, because we are trying to improve the speed with help of CUDA cores. Here is the NLTK code: import nltk import re import time snt1 = u'''"{'body': 'So as of today's close (8/15/14), I'm…

python regex nltk spacy tokenize

asked Dec 17 '21 at 13:52

PrimozS

vote

0 answers

How can i use thinc.types with spacy version 2

I am using spacy version==2.2.4 for name entity recognition and wishes to use the same version for testing custom spacy relation extraction pipeline. But unfortunately, I am facing the below issue while running custom relation extraction model with…

python spacy named-entity-recognition spacy-transformers coreference-resolution

asked Dec 14 '21 at 12:25

Sanpreet

vote

1 answer

How can I write a rule in spacy with pattern inside a pattern (nested pattern)

I would like to write a pattern in Spacy that matches a text and then optionally all the parenthesis at the right of the text if there is any. For example, for the following texts, I give as input the left part of the arrow "->" and expect as an…

python nested spacy rules

asked Dec 14 '21 at 11:27

Δημητρης Παππάς

vote

1 answer

Most efficient way to find a 3rd Person singular pronoun, a noun and a verb in a sentence in spacy

i have this code (I'm a newbie in Python) and it works fine, but i think my code is not efficient. It checks out if a sentence contains a Third-Person Singular Pronoun (he, she or it), a noun and a verb: def findNounVerbPronoun(): countElements…

python python-3.x spacy spacy-3

asked Dec 08 '21 at 16:21

Laz22434

vote

0 answers

Word embeddings for phrases

I have a dataset, which contains of approx. 50k records of job titles. For example: Python Developer Java Developer Accountant Salesman Developer Programmer etc. (Job titles are in german, but for explanation it doesn't matter) I want to cluster…

python machine-learning nlp spacy word-embedding

asked Dec 06 '21 at 16:34

Daniel Yefimov

vote

1 answer

Python - Is there a way to evaluate a named entity recognizer trained on IOB data using spacy?

I trained my named entity recognizer with spacy. I would like to evaluate it. So I looked at the spacy documentation and came across the scorer function. However, it doesn't seem to work with the IOB format. Do you think there will be a way to use…

python spacy named-entity-recognition evaluation

asked Dec 06 '21 at 15:56

Balkhrod

vote

1 answer

Merge name forms for same person found via NER with Spacy

I have a document of text and I want to find out which person the text is "most about", my approximation of "most about" will be defined as the person mentioned most. I use Spacy Named Entity Recognition (NER) to get a list of all NER, then filter…

python nlp nltk spacy

asked Dec 03 '21 at 12:06

MattG

5,589
5
36
52

vote

1 answer

Manually set sentence boundaries in Spacy

Suppose I know ahead of time the character-level sentence boundaries in a document: text = "The cat chased the mouse. The mouse ran away." boundaries = [(0, 25), (26, 45)] for start, end in boundaries: print(text[start:end]) Is there a way that…

python spacy sentence

asked Dec 02 '21 at 16:45

user94154

16,176
20
77
116

vote

0 answers

Training Spacy custom entities from excel sheet

I'm new to the NLP and have come across one problem while training spacy format data. TRAIN_DATA = [ ("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}), ("I like London and Berlin.", {"entities": [(7, 13, "LOC"), (18, 24,…

python-3.x nlp spacy named-entity-recognition

asked Nov 30 '21 at 11:22

user17551439

vote

1 answer

Blank lemmatization using spacy

How to use lemmatization in Spacy? I try with this code but the output is blank. My spacy ver. 3.2.0 from spacy.lang.id import Indonesian nlp = Indonesian() def tokenizer(text): return [token.lemma_.lower() for token in nlp(text) if not…

python nlp spacy

asked Nov 28 '21 at 22:07

caeruleum

vote

1 answer

Converting Spacy NER entity format to CONLL 2003 format

I am working on NER application where i have data annotated in the following data format. [('The F15 aircraft uses a lot of fuel', {'entities': [(4, 7, 'aircraft')]}), ('did you see the F16 landing?', {'entities': [(16, 19, 'aircraft')]}), ('how…

python spacy johnsnowlabs-spark-nlp conll

asked Nov 23 '21 at 18:18

imhans33

Prev 1 2 3

…

99 100 Next