Questions tagged [spacy]

Industrial strength Natural Language Processing (NLP) with Python and Cython

spaCy is a library for advanced Natural Language Processing in Python and Cython. Its features include tokenization, part-of-speech tagging, dependency parsing, sentence boundary detection, named entity recognition and training of statistical neural network models.


Resources

3742 questions
16
votes
1 answer

Understanding Spacy's Scorer Output

I'm evaluating a custom NER model that I built using Spacy. I'm evaluating the training sets using Spacy's Scorer class. def Eval(examples): # test the saved model print("Loading from", './model6/') ner_model =…
Evan Lalo
  • 1,209
  • 1
  • 14
  • 34
16
votes
5 answers

How to do text pre-processing using spaCy?

How to do preprocessing steps like Stopword removal , punctuation removal , stemming and lemmatization in spaCy using python. I have text data in csv file like paragraphs and sentences. I want to do text cleaning. Kindly give example by loading csv…
RVK
  • 473
  • 1
  • 5
  • 16
16
votes
6 answers

spaCy token.tag_ full list

The official documentation of token.tag_ in spaCy is as follows: A fine-grained, more detailed tag that represents the word-class and some basic morphological information for the token. These tags are primarily designed to be good features for…
Daniel
  • 1,783
  • 2
  • 15
  • 25
15
votes
3 answers

Replace entity with its label in SpaCy

Is there anyway by SpaCy to replace entity detected by SpaCy NER with its label? For example: I am eating an apple while playing with my Apple Macbook. I have trained NER model with SpaCy to detect "FRUITS" entity and the model successfully detects…
eng2019
  • 953
  • 10
  • 26
15
votes
3 answers

List most similar words in spaCy in pretrained model

With Gensim, after I've trained my own model, I can use model.wv.most_similar('cat', topn=5) and get a list of the 5 words that are closest to cat in the vector space. For example: from gensim.models import Word2Vec model =…
snapcrack
  • 1,761
  • 3
  • 20
  • 40
15
votes
1 answer

removing stop words using spacy

I am cleaning a column in my data frame, Sumcription, and am trying to do 3 things: Tokenize Lemmantize Remove stop words import spacy nlp = spacy.load('en_core_web_sm', parser=False, entity=False) df['Tokens'] =…
Nelly Yuki
  • 399
  • 1
  • 4
  • 16
15
votes
1 answer

Don't know how to uninstall unwanted Spacy installation, model

I have limited disk memory and want to know how to uninstall/remove files for spacy 2.xx under python 2.7 (I use python3 and think I've got spacy installed correctly for it). Ditto for the default model in my python3 install. Here's my terminal…
Gregg Williams
  • 868
  • 1
  • 8
  • 15
15
votes
2 answers

Directly load spacy model from packaged tar.gz file

Is it possible to load a packaged spacy model (i.e. foo.tar.gz) directly from the tar file instead of installing it beforehand? I would imagine something like: import spacy nlp = spacy.load(/some/path/foo.tar.gz)
evermean
  • 1,255
  • 21
  • 49
15
votes
3 answers

How does spacy lemmatizer works?

For lemmatization spacy has a lists of words: adjectives, adverbs, verbs... and also lists for exceptions: adverbs_irreg... for the regular ones there is a set of rules Let's take as example the word "wider" As it is an adjective the rule for…
Luis Ramon Ramirez Rodriguez
  • 9,591
  • 27
  • 102
  • 181
14
votes
1 answer

How to identify abbreviations/acronyms and expand them in spaCy?

I have a large (~50k) term list and a number of these key phrases / terms have corresponding acronyms / abbreviations. I need a fast way of finding either the abbreviation or the expanded abbreviation ( i.e. MS -> Microsoft ) and then replacing that…
steve
  • 393
  • 1
  • 4
  • 14
14
votes
1 answer

How to use spaCy to create a new entity and learn only from keyword list

I am trying to use spaCy to create a new entity categorization 'Species' with a list of species names, example can he found here. I found a tutorial for training new entity type from this spaCy tutorial (Github code here). However, the problem is, I…
katie lu
  • 489
  • 1
  • 5
  • 23
14
votes
1 answer

Using PhraseMatcher in SpaCy to find multiple match types

The SpaCy documentation and samples show that the PhraseMatcher class is useful to match sequences of tokens in documents. One must provide a vocabulary of sequences that will be matched. In my application, I have documents that are collections of…
Vladislavs Dovgalecs
  • 1,525
  • 2
  • 16
  • 26
14
votes
1 answer

POS tagging using spaCy

I am trying to do POS tagging using the spaCy module in Python. Here is my code for the same from spacy.en import English, LOCAL_DATA_DIR import spacy.en import os data_dir = os.environ.get('SPACY_DATA', LOCAL_DATA_DIR) nlp = English(parser=False,…
pd176
  • 821
  • 3
  • 10
  • 20
13
votes
2 answers

How to use LanguageDetector() from spacy_langdetect package?

I'm trying to use the spacy_langdetect package and the only example code I can find is (https://spacy.io/universe/project/spacy-langdetect): import spacy from spacy_langdetect import LanguageDetector nlp =…
user3242036
  • 645
  • 1
  • 7
  • 16
13
votes
2 answers

How to avoid double-extracting of overlapping patterns in SpaCy with Matcher?

I need to extract item combination from 2 lists by means of python Spacy Matcher. The problem is following: Let us have 2 lists: colors=['red','bright red','black','brown','dark brown'] animals=['fox','bear','hare','squirrel','wolf'] I match the…
Victoria
  • 395
  • 3
  • 13