Questions tagged [spacy]

Industrial strength Natural Language Processing (NLP) with Python and Cython

spaCy is a library for advanced Natural Language Processing in Python and Cython. Its features include tokenization, part-of-speech tagging, dependency parsing, sentence boundary detection, named entity recognition and training of statistical neural network models.


Resources

3742 questions
20
votes
3 answers

Cased VS uncased BERT models in spacy and train data

I want to use spacy's pretrained BERT model for text classification but I'm a little confused about cased/uncased models. I read somewhere that cased models should only be used when there is a chance that letter casing will be helpful for the task.…
Oleg Ivanytskyi
  • 959
  • 2
  • 12
  • 28
20
votes
2 answers

Extract verb phrases using Spacy

I have been using Spacy for noun chunks extraction using Doc.noun_chunks property provided by Spacy. How could I extract verb phrases from input text using Spacy library (of the form 'VERB ? ADV * VERB +' )?
Nidhi
  • 313
  • 1
  • 2
  • 4
20
votes
2 answers

How to get spaCy NER probability

I want to combine spaCy's NER engine with a separate NER engine (a BoW model). I'm currently comparing outputs from the two engines, trying to figure out what the optimal combination of the two would be. Both perform decently, but quite often spaCy…
Mede
  • 203
  • 2
  • 7
19
votes
5 answers

I am getting an InvalidArchiveError in anaconda prompt when I am trying to install spacy. How to solve it?

InvalidArchiveError('Error with archive C:\Users\Sahaja Reddy\Anaconda3\pkgs\openssl-1.1.1g-he774522_0.conda. You probably need to delete and re-download or re-create this file. Message from libarchive was:\n\nCould not unlink (errno=22,…
Sahaja Reddy
  • 191
  • 1
  • 1
  • 3
19
votes
3 answers

In spacy, how to use your own word2vec model created in gensim?

I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly…
Subigya Upadhyay
  • 266
  • 1
  • 2
  • 11
19
votes
2 answers

How to extract subjects in a sentence and their respective dependent phrases?

I am trying to work on subject extraction in a sentence, so that I can get the sentiments in accordance with the subject. I am using nltk in python2.7 for this purpose. Take the following sentence as an example: Donald Trump is the worst president…
psr
  • 2,619
  • 4
  • 32
  • 57
18
votes
2 answers

How to get all words from spacy vocab?

I need all the words from Spacy vocab. Suppose, I initialize my spacy model as nlp = spacy.load('en') How do I get the text of words from nlp.vocab?
pauli
  • 4,191
  • 2
  • 25
  • 41
18
votes
1 answer

Speed up Spacy Named Entity Recognition

I'm using spacy to recognize street addresses on web pages. My model is initialized basically using spacy's new entity type sample code found here: https://github.com/explosion/spaCy/blob/master/examples/training/train_new_entity_type.py My…
podcastguy
  • 193
  • 1
  • 6
18
votes
1 answer

SpaCy: How to get the spacy model name?

It doesn't show up in pip list zeke$ pip list | grep spacy spacy (1.7.3) How do I get the name of the model? I tried this but it doesn't work echo "spaCy model:" python3 -m sputnik --name spacy find Throws up this error: zeke$ python3 -m sputnik…
Saravanabalagi Ramachandran
  • 8,551
  • 11
  • 53
  • 102
17
votes
2 answers

Tokenizing using Pandas and spaCy

I'm working on my first Python project and have reasonably large dataset (10's of thousands of rows). I need to do some nlp (clustering, classification) on 5 text columns (multiple sentences of text per 'cell') and have been using pandas to…
LMGagne
  • 1,636
  • 6
  • 24
  • 47
16
votes
3 answers

Python: Spacy and memory consumption

1 - THE PROBLEM I'm using "spacy" on python for text documents lemmatization. There are 500,000 documents having size up to 20 Mb of clean text. The problem is the following: spacy memory consuming is growing in time till the whole memory is used. 2…
VictorDDT
  • 583
  • 9
  • 26
16
votes
1 answer

How to create NER pipeline with multiple models in Spacy

I am trying to train new entities for spacy NER. I tried adding my new entity to existing spacy 'en' model. However, this affected the prediction model for both 'en' and my new entity. I, therefore, created a blank model and trained the entity…
Suvin K S
  • 229
  • 2
  • 8
16
votes
2 answers

Disabling part of the nlp pipeline

I am running spaCy v2.x on a windows box with python3. I do not have admin privelages, so i have to call the pipeline as: nlp = en_core_web_sm.load() When I run my same script on a *nix box, I can load the pipeline as: nlp = spacy.load('en', disable…
Britt
  • 539
  • 1
  • 7
  • 21
16
votes
4 answers

Is there a bi gram or tri gram feature in Spacy?

The below code breaks the sentence into individual tokens and the output is as below "cloud" "computing" "is" "benefiting" " major" "manufacturing" "companies" import en_core_web_sm nlp = en_core_web_sm.load() doc = nlp("Cloud computing is…
venkatttaknev
  • 669
  • 1
  • 7
  • 21
16
votes
1 answer

Spacy custom tokenizer to include only hyphen words as tokens using Infix regex

I want to include hyphenated words for example: long-term, self-esteem, etc. as a single token in Spacy. After looking at some similar posts on StackOverflow, Github, its documentation and elsewhere, I also wrote a custom tokenizer as below: import…
Vishal
  • 227
  • 1
  • 2
  • 8