Questions tagged [spacy]

Industrial strength Natural Language Processing (NLP) with Python and Cython

spaCy is a library for advanced Natural Language Processing in Python and Cython. Its features include tokenization, part-of-speech tagging, dependency parsing, sentence boundary detection, named entity recognition and training of statistical neural network models.


Resources

3742 questions
1
vote
1 answer

Information extraction with Spacy with context awareness

I'm trying to extract project relevant information via web scraping using Python+ Spacy and then building a table of projects with few attributes , example phrases that are of interest for me are: The last is the 300-MW Hardin Solar III Energy…
1
vote
1 answer

Spacy's phrasematcher with reflexive pronoun in french

First you don't to have to know french to help me as i will explain the grammar rules that i need to apply with spacy in python. I have a file (test.txt) with multiple phrases in french (about 5000), each one different one from another and a mail…
Laz22434
  • 373
  • 1
  • 12
1
vote
1 answer

Date Entity Parsing Incorrect Year for Incomplete Dates

I have a dataset (df_test) containing of several news articles (Text_4). Using SpaCy, I've extracted the 'DATE' entities. For those I want to see whether they are in the future or in the past (to identify news articles that reference future events…
AlexanderP
  • 126
  • 6
1
vote
1 answer

Rename spacy's pos tagger labels

i'm looking for something specific and didn't really found an answer: I'm looking to rename the pos tags label of spacy. E.g. if i have this code: def eng(textstr): nlp = spacy.load("en_core_web_sm") doc = nlp(textstr) for token in…
Laz22434
  • 373
  • 1
  • 12
1
vote
1 answer

Spacy matcher with regex across tokens

I have the following sentences: phrases = ['children externalize their emotions through outward behavior', 'children externalize hidden emotions.', 'children externalize internalized emotions.', 'a child might externalize…
Akbar Hussein
  • 352
  • 3
  • 13
1
vote
1 answer

How do I visualize the results of Rule-based Matcher in Spacy as an HTML page?

I am using the rule-based matcher in Spacy to look for some patterns in a text. Here is a example of my pattern text = "GDP in developing countries such as Vietnam will continue growing at a high rate." pattern = [{'DEP':'amod', 'OP':"?"}, …
ary
  • 159
  • 1
  • 8
1
vote
1 answer

processing text with nlp.pipe taking hours

I have 300.000 news articles in my dataset and I'm using en_core_web_sm to do POS tagging, parsing, ner extraction. However, this is taking hours and hours and seems to never be done. The code is working, but very slow. When I take a sample of my…
user17143533
1
vote
0 answers

Is there a way to limit per_process memory on a shared vgpu with Spacy, Torch, etc?

I'm looking to to find the gpu memory fraction limiting parameters for: spacy, torch, and other common models. I know for tensorflow models we can set per_process_gpu_memory_fraction but is there in an equivalent in pytorch or spacy? Bonus points if…
Stephan
  • 155
  • 13
1
vote
0 answers

Modifying displacy NER annotation tool to support overlapping entities

I am working on a tool to visualise NER matches and I am trying to resolve overlaps. I am using SpaCy's displacy as a template, and my idea was to add a function after displacy.render() that modifies the HTML string to match my desired output, but I…
GSwart
  • 201
  • 2
  • 9
1
vote
1 answer

Converting .tsv format to spacy for NER

I am facing a problem, not that good in coding,I have a tsv file where data looks like this: lines are separated by a blank line. I have tried using this: def load_data_spacy(file_path): ''' Converts data from: word \t label \n word \t label \n \n…
1
vote
0 answers

Calculating number of T-units in a given sentence using Python

I've been working on a second language development project. I need to calculate the t-unit of a given sentence using Python. For example, for the following sentences: The man did not like water. 1 t-unit (The man did not like water) The man did not…
user3288051
  • 574
  • 1
  • 11
  • 28
1
vote
1 answer

Confidence Score of Predicted NER entities using Spacy

I am trying to predict entities using a custom trained NER model using spacy. I read https://github.com/explosion/spaCy/pull/8855 that confidence scores of each entity can be obtained using spancat. But I have a little confusion regarding to make…
imhans33
  • 133
  • 11
1
vote
2 answers

Deleting the sentence and updating the index

I am working on a data format like this. data = [{"content":'''Hello I am Aniyya. I enjoy playing Football. I love eating grapes''',"annotations":[{"id":1,"start":11,"end":17,"tag":"name"}, …
user17179901
1
vote
1 answer

Confidence Score of Spacy NER custom trained and pretrained model

I have seen in the documents of spacy that confidence score of NER entities are rolled out in recent version. I am using spacy==3.1.2. I tried the following code to find the confidence score but i am getting an error. Also is it possible to find the…
user17179901
1
vote
1 answer

Feed large text to PyTextRank

I would like to use PyTextRank for keyphrase extraction. How can I feed feed 5 million documents (each document consisting of a few paragraphs) to the package? This is the example I see on the official tutorial. text = "Compatibility of systems of…
E.K.
  • 4,179
  • 8
  • 30
  • 50
1 2 3
99
100