Questions tagged [textacy]

Reference Site: https://textacy.readthedocs.io/en/stable/

Features

  • Stream text, json, csv, and spaCy binary data to and from disk
  • Clean and normalize raw text, before analyzing it
  • Explore a variety of included datasets, with both text data and metadata
  • from Congressional speeches to historical literature to Reddit comments
  • Access and filter basic linguistic elements, such as words and ngrams, noun chunks and sentences
  • Extract named entities, acronyms and their definitions, direct quotations, key terms, and more from documents
  • Compare strings, sets, and documents by a variety of similarity metrics
  • Transform documents and corpora into vectorized and semantic network representations
  • Train, interpret, visualize, and save sklearn-style topic models using LSA, LDA, or NMF methods
40 questions
1
vote
2 answers

How to improve textacy.extract.semistructured_statements() results

For this project, I am using the Wikipedia, spacy, and textacy.extract modules. I used the wikipedia module to grab the page I set my subject to. It will returns a string of its contents. Then, I use the textacy.extract.semistructured_statements()…
exe
  • 354
  • 3
  • 23
1
vote
2 answers

Python Textacy pos_regex_matches vs matches

I'm trying to find verbs in a sentence with python for a NLP problem. I found an old answer here on stackoverflow and it works with the deprecated pos_regex_matches. Using the new matches function I have a pretty boring problem. The new function…
loricelli
  • 33
  • 7
1
vote
3 answers

pip install textacy fails

I am trying to install textacy to perform NLP tasks, but getting an error while trying to do: python -m pip install textacy --user The code starts running but after a while it fails and shows this output: ERROR: Command errored out with exit status…
Javier C
  • 25
  • 1
  • 10
1
vote
2 answers

ImportError: cannot import name 'constants'

I need to import the constants library but it is not working... import spacy import pandas import textacy import pandas as pd from pandas import Series from . import constants Erreur: ImportError Traceback (most recent…
marin
  • 923
  • 2
  • 18
  • 26
1
vote
1 answer

Textacy unable to create corpus from a textacy.doc.Doc class

I'm just working thought the text tutorials with data outside the datasets module for work. I get some text data from a dataframe and I have this stored as a string variable for work. def mergeText(df): content = '' for i in…
1
vote
0 answers

How to match a SVO pattern with Textacy

How do you use Textacy's pos_regex_match() method to find subject-verb-object triples using their pseudo-regular-expression syntax? And yes, I'm aware of textacy.extract.subject_verb_object_triples(), but this function is very inaccurate and finds…
Cerin
  • 60,957
  • 96
  • 316
  • 522
0
votes
0 answers

How does textacy 0.12 work now in jupyter notebook and latest version of python

from textacy import extract, text_stats from spacy.matcher import Matcher patterns = [{"POS": "ADV"}, {"POS": "VERB"}] verb_phrases = textacy.extract.matches(doc, patterns=patterns) TypeError Traceback (most recent…
0
votes
1 answer

Different behaviors of a generator

I created a generator as follow from textacy.extract.kwic import keyword_in_context test = keyword_in_context('this is a test. another test to see how', keyword='test', window_width=5) print(test) # Out:
Nemo
  • 1,124
  • 2
  • 16
  • 39
0
votes
1 answer

Output to a pandas dataframe

I am extracting quotes from text in the following manner and with the following output: data = [ ("\"Hello, nice to meet you,\" said John. Jane said, \"It is nice to meet you as well.\"", {"url": "example1.com", "date": "Jan 1"}), …
jedmund
  • 55
  • 4
0
votes
2 answers

Extract quotations using textacy

I am attempting to extract quotations and quotation attributions (i.e., the speaker) from text, but I am getting errors. Here is the setup: import textacy import pandas as pd import spacy data = [ ("\"Hello, nice to meet you,\" said world…
jedmund
  • 55
  • 4
0
votes
1 answer

Perform function on multiple records using textacy

I am attempting to extract quotations and quotation attributions from text across multiple records using a function from textacy. So far, I have successfully executed the function on a single record, as such: import textacy data = ("\"Hello, nice…
jedmund
  • 55
  • 4
0
votes
1 answer

How to return an empty value or None on pandas dataframe?

SAMPLE DATA: https://docs.google.com/spreadsheets/d/1s6MzBu5lFcc-uUZ9B6CI1YR7P1fDSm4cByFwKt3ckgc/edit?usp=sharing I have this function that uses textacy to extract the source attribution. This automatically returns the speaker, cue and content of…
Mtrinidad
  • 157
  • 1
  • 11
0
votes
0 answers

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 49: for textacy

I am using the textacy method to get synonyms. import textacy.resources rs = textacy.resources.ConceptNet() syn=rs.get_synonyms('happy') I get the below error Traceback (most recent call last): File "", line 1, in File…
Dhiraj Tayade
  • 407
  • 3
  • 10
  • 22
0
votes
1 answer

NLP summerization using textacy/spacy

I want to generate a summary maybe in one sentence from this text. I am using textacy.py. Here is my code: import textacy import textacy.keyterms import textacy.extract import spacy nlp = spacy.load('en_core_web_sm') text = '''Sauti said, 'O thou…
BB23850
  • 109
  • 1
  • 11
0
votes
1 answer

Textacy has no module preprocess or normalize whitespace

Sudden problems with textacy text3 = textacy.normalize_whitespace(text2) AttributeError: module 'textacy' has no attribute 'normalize_whitespace' This happens in Python 3.7 The script worked perfectly for the past year. The other day this error…
Ethe99
  • 49
  • 8