Questions tagged [keyword-extraction]

Tag related to the Natural Language Processing (NLP) tasked which consists on the automatic identification of terms that best describe the subject of a document.

Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document.

Key phrases, key terms, key segments or just keywords are the terminology which is used for defining the terms that represent the most relevant information contained in the document. Although the terminology is different, function is the same: characterization of the topic discussed in a document. The task of keyword extraction is an important problem in Text Mining, Information Retrieval and Natural Language Processing.

18 questions
4
votes
1 answer

Extracting and ranking keywords from short text

I am working on a project to extract a keyword from short texts (3-4 sentences). Using the spaCy library I extract noun phrases and NER and use them as keywords. However, I would like to sort them based on their importance wrt the original text. I…
Marian H
  • 41
  • 2
2
votes
1 answer

How to define pos_pattern for extracting nouns followed by zero or more sequence of nouns or adjectives for KeyphraseCountVectorizer?

I'm trying to extract Arabic keywords from tweets. I'm using keyBERT with KeyphraseCountVectorizer vectorizer = KeyphraseCountVectorizer(pos_pattern='< N.*>*') I'm trying to write more custom pos patterns regExp to select nouns followed by zero or…
1
vote
1 answer

Get topN keywords with PySpark CountVectorizer

I want to extract keywords using pyspark.ml.feature.CountVectorizer. My input Spark dataframe looks as following: id text 1 sun, mars, solar system, solar system, mars, solar system, venus, solar system, mars 2 planet, moon, milky way,…
1
vote
1 answer

Feed large text to PyTextRank

I would like to use PyTextRank for keyphrase extraction. How can I feed feed 5 million documents (each document consisting of a few paragraphs) to the package? This is the example I see on the official tutorial. text = "Compatibility of systems of…
E.K.
  • 4,179
  • 8
  • 30
  • 50
1
vote
4 answers

How to extract words from repeating strings

Here I have a string in a list: ['aaaaaaappppppprrrrrriiiiiilll'] I want to get the word 'april' in the list, but not just one of them, instead how many times the word 'april' actually occurs the string. The output should be something…
1
vote
1 answer

KeyBERT package is not working on Google Colab

I'm using KeyBERT on Google Colab to extract keywords from the text. from keybert import KeyBERT model = KeyBERT('distilbert-base-nli-mean-tokens') text_keywords = model.extract_keywords(my_long_text) But I get the following error: OSError: Model…
Zia
  • 389
  • 1
  • 3
  • 17
1
vote
1 answer

Find if a phrase is 'generally rare' in English

I want to extract rare words from text. not rare in that text but generally rare in English. Is there an NLTK module that uses a large corpus that can answer such a query?
kambi
  • 3,291
  • 10
  • 37
  • 58
0
votes
0 answers

Calculate similarity between sets of keywords in Python

For my project I want to compare to sets of keywords that are stored in lists and obtain a similarity index. An example would look like the following: db_1: list of 5 keywords db_2: list of 10 keywords The data was obtained mostly through web…
0
votes
0 answers

How to implement keyword based text clustering?

I have 4 topics and 10 keywords representing each of those 4 topics. I now want to classify all the documents in my dataset in one of these 4 topics using the keywords extracted for each topic. topic0 =…
0
votes
1 answer

Rearrange row upon column value

I have a DataFrame where I would like to rearrange the data of a given columns. What I have: text KEYWORD 0 Fetch.ai will transform economies, healthcare,... supplies chain issues 1 …
Zion
  • 47
  • 9
0
votes
0 answers

division by zero in calculating TF-IDF algorithm for keyword-extraction

I wrote a code based on the TF-IDF algorithm to extract keywords from a very large text. The problem is that I keep getting the division by zero error. When I debug my code, everything is working perfectly. As soon as I make the text shorter to…
Sohi.A
  • 1
  • 3
0
votes
0 answers

Receive "TypeError: 'DistilBertTokenizer' object is not callable" when using KeyBERT on Colab

Running KeyBERT to extract keywords on Google Colab gives with the following codes: from keybert import KeyBERT model = KeyBERT('distilbert-base-nli-mean-tokens') keywords = model.extract_keywords(doc, keyphrase_ngram_range=(1, 1), stop_words…
Zia
  • 389
  • 1
  • 3
  • 17
0
votes
1 answer

Can you retrain RAKE?

Is it possible to retrain RAKE (Rapid automatic keyword extractor)? If so, how? Thank you!
priegueee
  • 57
  • 6
0
votes
3 answers

How to extract keywords using TFIDF for each row in python?

I have a column which has text only. I need to extract top keywords from each row using TFIDF. Example Input: df['Text'] 'I live in India', 'My favourite colour is Red', 'I Love Programming' Expected output: df[Text] …
0
votes
1 answer

python key phrase extraction using pke module

I was trying to extract key phrases using https://github.com/boudinfl/pke module. When I run it once it is perfectly working. But when I am running it for several times it emits following error. ZeroDivisionError: float division by zero my code is…
Gihan Gamage
  • 2,944
  • 19
  • 27
1
2