Questions tagged [spacy]

Industrial strength Natural Language Processing (NLP) with Python and Cython

spaCy is a library for advanced Natural Language Processing in Python and Cython. Its features include tokenization, part-of-speech tagging, dependency parsing, sentence boundary detection, named entity recognition and training of statistical neural network models.


Resources

3742 questions
1
vote
1 answer

Spacy Breaks serialization in Pardo - Apache Beam

I am trying to build an Dataflow pipeline and it works fine without spacy. After I introduce spacy it start failing with the error below: return _create_pardo_operation( File…
ankie
  • 11
  • 2
1
vote
1 answer

Training NER using spacy on Google Colab

I am trying to train custom entities using spacy on google colab. But when I try to configure the spacy config file by using !python3 -m spacy debug data base_config.cfg ✘ Unknown command: debug Available: download, link, info, train, pretrain,…
Aniiya0978
  • 274
  • 1
  • 9
1
vote
1 answer

The length of the vocabulary of Spacy 'en_core_web_sm'

I am using a macbook and trying to learn NLP from a udemy course. The length of my space library is len(doc.vocab)=532 , however in the video same length is around 57000. I downloaded the larger version as well nothing changed.
nlpkind
  • 7
  • 3
1
vote
1 answer

spaCy How to initialize a Doc with entities in IOB format?

In my spaCy project, I would like to initialize a Doc object with text, labels and whitespaces. spaCy doesn't appreciate the way I provide the labels however, and shows its lack of appreciation in the following error message: doc = Doc(nlp.vocab,…
Dustin
  • 483
  • 3
  • 13
1
vote
2 answers

Explosion's Spacy ValueError: [E012] Cannot add pattern for zero tokens to matcher

I keep receiving the error message in the title, and I'm at my wits end. Officially. Google shows no matches for this phrase: "Cannot add pattern for zero tokens to matcher". When I searched for help on Explosion's support page, I could not find any…
user16957706
1
vote
0 answers

How to find the specific subject that belongs to an object from an article in spacy?

I am trying to find a specific subject that belongs to an object from an article. like: text = "Masood Azhar was killed by Imran Raza holding number 03213216544. the news was published on the website on 2021" who's number is? text = "Masood…
azhar
  • 351
  • 3
  • 13
1
vote
1 answer

NLP: Finding which sentence is closest in meaning to a list of other sentences

I have two lists of sentences (list A and list B). I want to find which sentence in A is closest in meaning to the entirety of B. This is not the same as the standard cosine similarity check you can do when comparing (in spacy for example) two doc…
1
vote
1 answer

How to replace only specific words in a sentence by their pos tag using spacy?

I would like to repalce specific token with their pos tag using spacy but I am encpountering this error, is there a way to overcome it. lemma_token = [sent_doc.replace(w, w.pos_) for w in sent_doc if w.pos_ in list_postag] AttributeError:…
emma
  • 193
  • 2
  • 13
1
vote
1 answer

Failed to convert iob to spaCy binary format

I try to convert my IOB (token-per-line NER) files (train/test) to Spacy 3 binary format. Example of input format (with separator "\t", no blanklines and encoding utf-8) : Département B-LOCATION des I-LOCATION Bouches-du-Rhône I-LOCATION . …
Lter
  • 43
  • 11
1
vote
1 answer

how to generate function from a function

Hi I am writing a lot of function with the following form: def is_match_a(doc): pattern = # spaCy patterns matcher = Matcher(nlp.vocab) matcher.add("match_a", [pattern]) matches = matcher(doc) if matches: return True I…
1
vote
1 answer

Can you set spaCy to only tag GPE (remove ORG's)?

There are a few instances where spaCy tags an ORG instead of the GPE I am looking for. I am not sure how to 'turn off' tagging ORG so that it will only look for GPE, or if there is a way to prioritize GPE first. import spacy from spacy import…
Logan
  • 11
  • 2
1
vote
1 answer

spaCy phrasematcher failing some cases though POS tags the same

The spaCy PhraseMatcher (using the LEMMA attribute) is only working on some of my sentences, but its failure seems entirely random. I have a minimal working example below, trying to extract the term 'colorful': import spacy nlp =…
O Winn
  • 13
  • 3
1
vote
1 answer

Adding functionality to a Spacy NER visualizer

I am using Spacy to visualize NERs in a notebook as follows: import spacy from spacy import displacy text = "When Sebastian Thrun started working on self-driving cars at Google in 2007, few people outside of the company took him seriously." nlp =…
JFerro
  • 3,203
  • 7
  • 35
  • 88
1
vote
1 answer

SpaCy Rules-based matching for persons patterns

I try to use the Spacy patterns in order to match the corresponding to differents surface shape of person in my text as: LASTNAME, FIRSTNAME or/and FIRSTNAME, LASTNAME and/or FIRSTNAME LASTNAME (no punct). I Try this: import spacy # create a nlp…
Lter
  • 43
  • 11
1
vote
1 answer

ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component

#Ejemplo: un componente simple (1) # Crea el objeto nlp nlp = spacy.load("es_core_news_sm") # Define un componente personalizado def custom_component(doc): # Imprime la longitud del doc en pantalla print("longitud del Doc:", len(doc)) #…
1 2 3
99
100