Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
34
votes
3 answers

Python Gensim: how to calculate document similarity using the LDA model?

I've got a trained LDA model and I want to calculate the similarity score between two documents from the corpus I trained my model on. After studying all the Gensim tutorials and functions, I still can't get my head around it. Can somebody give me a…
still_st
  • 363
  • 1
  • 3
  • 7
33
votes
8 answers

Word frequency algorithm for natural language processing

Without getting a degree in information retrieval, I'd like to know if there exists any algorithms for counting the frequency that words occur in a given body of text. The goal is to get a "general feel" of what people are saying over a set of…
Mark McDonald
  • 7,571
  • 6
  • 46
  • 53
33
votes
12 answers

Does an algorithm exist to help detect the "primary topic" of an English sentence?

I'm trying to find out if there is a known algorithm that can detect the "key concept" of a sentence. The use case is as follows: User enters a sentence as a query (Does chicken taste like turkey?) Our system identifies the concepts of the sentence…
rockit
  • 339
  • 1
  • 3
  • 4
33
votes
7 answers

NLTK vs Stanford NLP

I have recently started to use NLTK toolkit for creating few solutions using Python. I hear a lot of community activity regarding using Stanford NLP. Can anyone tell me the difference between NLTK and Stanford NLP? Are they two different libraries?…
RData
  • 959
  • 1
  • 13
  • 33
33
votes
4 answers

What does NN VBD IN DT NNS RB means in NLTK?

when I chunk text, I get lots of codes in the output like NN, VBD, IN, DT, NNS, RB. Is there a list documented somewhere which tells me the meaning of these? I have tried googling nltk chunk code nltk chunk grammar nltk chunk tokens. But I am not…
Knows Not Much
  • 30,395
  • 60
  • 197
  • 373
33
votes
3 answers

Java library for keywords extraction from input text

I'm looking for a Java library to extract keywords from a block of text. The process should be as follows: stop word cleaning -> stemming -> searching for keywords based on English linguistics statistical information - meaning if a word appears more…
Shay
  • 497
  • 1
  • 4
  • 10
33
votes
8 answers

Computing N Grams using Python

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like: "Cystic fibrosis affects 30,000 children and young adults in the US alone Inhaling the mists of salt water can reduce the pus and infection that fills the…
gran_profaci
  • 8,087
  • 15
  • 66
  • 99
32
votes
7 answers

What is NLTK POS tagger asking me to download?

I just started using a part-of-speech tagger, and I am facing many problems. I started POS tagging with the following: import nltk text=nltk.word_tokenize("We are going out.Just you and me.") When I want to print 'text', the following…
Pearl
  • 759
  • 1
  • 6
  • 7
32
votes
7 answers

Java API for plural forms of English words

Are there any Java API(s) which will provide plural form of English words (e.g. cacti for cactus)?
Joe
  • 14,513
  • 28
  • 82
  • 144
32
votes
5 answers

Difference between Rasa core and Rasa nlu

I tried to understand the difference between Rasa core and Rasa NLU from the official documentation, but I don't understand much. What I understood is that Rasa core is used to guide the flow of the conversation, while Rasa NLU is used to process…
Henu
  • 1,622
  • 2
  • 22
  • 27
32
votes
5 answers

How is the TFIDFVectorizer in scikit-learn supposed to work?

I'm trying to get words that are distinctive of certain documents using the TfIDFVectorizer class in scikit-learn. It creates a tfidf matrix with all the words and their scores in all the documents, but then it seems to count common words, as well.…
Jonathan
  • 10,571
  • 13
  • 67
  • 103
32
votes
7 answers

N-gram generation from a sentence

How to generate an n-gram of a string like: String Input="This is my car." I want to generate n-gram with this input: Input Ngram size = 3 Output should be: This is my car This is is my my car This is my is my car Give some idea in Java, how to…
Preetam Purbia
  • 5,736
  • 3
  • 24
  • 26
31
votes
4 answers

Transformers v4.x: Convert slow tokenizer to fast tokenizer

I'm following the transformer's pretrained model xlm-roberta-large-xnli example from transformers import pipeline classifier = pipeline("zero-shot-classification", model="joeddav/xlm-roberta-large-xnli") and I get the…
31
votes
4 answers

Unable to load the spacy model 'en_core_web_lg' on Google colab

I am using spacy in google colab to build an NER model for which I have downloaded the spaCy 'en_core_web_lg' model using import spacy.cli spacy.cli.download("en_core_web_lg") and I get a message saying ✔ Download and installation successful You…
Jithin P James
  • 752
  • 1
  • 7
  • 23
31
votes
8 answers

How to verify installed spaCy version?

I have installed spaCy with python for my NLP project. I have installed that using pip. How can I verify installed spaCy version? using pip install -U spacy What is command to verify installed spaCy version?
Pramod S. Nikam
  • 4,271
  • 4
  • 38
  • 62