Questions tagged [textacy]

Reference Site: https://textacy.readthedocs.io/en/stable/

Features

  • Stream text, json, csv, and spaCy binary data to and from disk
  • Clean and normalize raw text, before analyzing it
  • Explore a variety of included datasets, with both text data and metadata
  • from Congressional speeches to historical literature to Reddit comments
  • Access and filter basic linguistic elements, such as words and ngrams, noun chunks and sentences
  • Extract named entities, acronyms and their definitions, direct quotations, key terms, and more from documents
  • Compare strings, sets, and documents by a variety of similarity metrics
  • Transform documents and corpora into vectorized and semantic network representations
  • Train, interpret, visualize, and save sklearn-style topic models using LSA, LDA, or NMF methods
40 questions
0
votes
1 answer

Create empty Corpus in textacy

I want to create an empty corpus in textacy and later on fill it up with data via corpus.add(doc) But everytime I try to create an empty corpus I am not able to save it and instead I get this error: IndexError: list index out of range I tried…
aAnnAa
  • 135
  • 2
  • 10
0
votes
2 answers

Textacy keyterms returning empty list

I would like to use textacy for key term extraction but the function I am using keyterms.key_terms.pagerank(doc) is just returning an empty list. I have tried related functions including the longer keyterms.key_terms_from_semantic_network(doc) with…
liamt12three
  • 57
  • 1
  • 4
0
votes
1 answer

textacy installation Killed for no reason

I am trying to install textacy on a python 3.6 Docker image. For no reason, the process crashes with a "Killed" statement in the end Here is the command: pip install textacy Here is the log: Collecting textacy Downloading…
guillim
  • 1,517
  • 1
  • 12
  • 16
0
votes
1 answer

cannot install spaCy and textacy packages

cannot install spacy and textacy in python 3.7 in pip environment and windows 10 I tried to install spacy and textacy package but I received an error. I searched the error and I found that I need to install visual c++ toolkit 2017. so I did it.…
Arghavan
  • 5
  • 3
0
votes
1 answer

How to implement function on pandas dataframe column

I'm trying to apply the textacy.extract.subject_verb_object_triples function to a pandas df column. The function returns empty generator objects, instead of subject_verb_object_triples when applied like so: sp500news3['title'].apply(lambda x:…
W.R
  • 187
  • 1
  • 1
  • 14
0
votes
1 answer

How to apply list function to textacy generator obj in pandas df

I'm applying the 'list' function to a pandas col which contains generator objects, in attempt to show all generator objects in col. When applying, the col returns empty lists. The 'subject_verb_object_triples' is a textacy function…
W.R
  • 187
  • 1
  • 1
  • 14
0
votes
1 answer

Iterate through a python 3 list of string and match every item against the others and return the the largest match

I have a python list. In this list I need to compare every item against the others and replace the shorter strings with the longest ones. EDIT: I have a list of peoples names that I get using the Spacy module and it's entity extraction. I get back…
0
votes
0 answers

Computing TTR on corpus

I'm trying to compute TTR of the Capitol Words corpus using lemmas over the entire vocabulary of each speaker. I'm also trying to have defaultdict shuffle through each entry and then give a TTR percentage per each speaker. So far I have the code…
Gerold
  • 11
  • 2
0
votes
1 answer

Textacy - Vectorizer Weighting Error

I've recently found Textacy and as i go through the API reference guide I'm running into an error for the Vectorizer. If i add any options from the API reference I get a TypeError: unexpected keyword argument. I get this error for other options in…
RKB
  • 73
  • 1
  • 11
0
votes
1 answer

Python: how to match dictionary value to file name?

I am relatively new to Python and struggling with the following: I have a list of about 52,000 dictionaries containing metadata on PDFs (that are stored separately). Now, I want to match 5,000 of these PDFs to their corresponding metadata…
NynkeLys
  • 95
  • 2
  • 11
1 2
3