Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
37
votes
4 answers

Can an algorithm detect sarcasm

I was asked to write an algorithm to detect sarcasm but I came across a flaw (or what seems like one) in the logic. For example if a person says A: I love Justin Beiber. Do you like him to? B: Yeah. Sure. I absolutely love him. Now this may be…
cjds
  • 8,268
  • 10
  • 49
  • 84
36
votes
2 answers

How to extract numbers (along with comparison adjectives or ranges)

I am working on two NLP projects in Python, and both have a similar task to extract numerical values and comparison operators from sentences, like the following: "... greater than $10 ... ", "... weight not more than 200lbs ...", "... height in 5-7…
svfat
  • 3,273
  • 1
  • 15
  • 34
36
votes
5 answers

What is the difference between Luong attention and Bahdanau attention?

These two attentions are used in seq2seq modules. The two different attentions are introduced as multiplicative and additive attentions in this TensorFlow documentation. What is the difference?
Shamane Siriwardhana
  • 3,951
  • 6
  • 33
  • 73
36
votes
3 answers

Python NLTK pos_tag not returning the correct part-of-speech tag

Having this: text = word_tokenize("The quick brown fox jumps over the lazy dog") And running: nltk.pos_tag(text) I get: [('The', 'DT'), ('quick', 'NN'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN'), ('the', 'DT'), ('lazy',…
faceoff
  • 901
  • 3
  • 11
  • 16
36
votes
3 answers

Stemming algorithm that produces real words

I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities I've used an…
Dave
  • 828
  • 1
  • 13
  • 18
36
votes
5 answers

Convert words between verb/noun/adjective forms

i would like a python library function that translates/converts across different parts of speech. sometimes it should output multiple words (e.g. "coder" and "code" are both nouns from the verb "to code", one's the subject the other's the object) #…
sam boosalis
  • 1,997
  • 4
  • 20
  • 32
35
votes
7 answers

What are the available tools to summarize or simplify text?

Is there any library, preferably in python but at least open source, that can summarize and or simplify natural-language text?
captainandcoke
  • 1,085
  • 2
  • 13
  • 16
35
votes
3 answers

How to interpret scikit's learn confusion matrix and classification report?

I have a sentiment analysis task, for this Im using this corpus the opinions have 5 classes (very neg, neg, neu, pos, very pos), from 1 to 5. So I do the classification as follows: from sklearn.feature_extraction.text import TfidfVectorizer import…
john doe
  • 2,233
  • 7
  • 37
  • 58
35
votes
7 answers

Computing precision and recall in Named Entity Recognition

Now I am about to report the results from Named Entity Recognition. One thing that I find a bit confusing is that my understanding of precision and recall was that one simply sums up true positives, true negatives, false positives and false…
Nick
  • 2,924
  • 4
  • 36
  • 43
34
votes
6 answers

FreqDist with NLTK

The Python package nltk has the FreqDist function which gives you the frequency of words within a text. I am trying to pass my text as an argument but the result is of the form: [' ', 'e', 'a', 'o', 'n', 'i', 't', 'r', 's', 'l', 'd', 'h', 'c', 'y',…
afg102
  • 361
  • 2
  • 4
  • 4
34
votes
1 answer

Applying Spacy Parser to Pandas DataFrame w/ Multiprocessing

Say I have a dataset, like iris = pd.DataFrame(sns.load_dataset('iris')) I can use Spacy and .apply to parse a string column into tokens (my real dataset has >1 word/token per entry of course) import spacy # (I have version 1.8.2) nlp =…
Max Power
  • 8,265
  • 13
  • 50
  • 91
34
votes
4 answers

Python - How to intuit word from abbreviated text using NLP?

I was recently working on a data set that used abbreviations for various words. For example, wtrbtl = water bottle bwlingbl = bowling ball bsktball = basketball There did not seem to be any consistency in terms of the convention used, i.e.…
Dan Temkin
  • 1,565
  • 1
  • 14
  • 18
34
votes
2 answers

How is the Vader 'compound' polarity score calculated in Python NLTK?

I'm using the Vader SentimentAnalyzer to obtain the polarity scores. I used the probability scores for positive/negative/neutral before, but I just realized the "compound" score, ranging from -1 (most neg) to 1 (most pos) would provide a single…
alicecongcong
  • 379
  • 2
  • 4
  • 4
34
votes
6 answers

What’s a good Python profanity filter library?

Like https://stackoverflow.com/questions/1521646/best-profanity-filter, but for Python — and I’m looking for libraries I can run and control myself locally, as opposed to web services. (And whilst it’s always great to hear your fundamental…
Paul D. Waite
  • 96,640
  • 56
  • 199
  • 270
34
votes
3 answers

Classifying Documents into Categories

I've got about 300k documents stored in a Postgres database that are tagged with topic categories (there are about 150 categories in total). I have another 150k documents that don't yet have categories. I'm trying to find the best way to…
erikcw
  • 10,787
  • 15
  • 58
  • 75