Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

4 answers

Can an algorithm detect sarcasm

I was asked to write an algorithm to detect sarcasm but I came across a flaw (or what seems like one) in the logic. For example if a person says A: I love Justin Beiber. Do you like him to? B: Yeah. Sure. I absolutely love him. Now this may be…

algorithm nlp

asked Dec 31 '12 at 04:21

cjds

8,268
10
49
84

votes

2 answers

How to extract numbers (along with comparison adjectives or ranges)

I am working on two NLP projects in Python, and both have a similar task to extract numerical values and comparison operators from sentences, like the following: "... greater than $10 ... ", "... weight not more than 200lbs ...", "... height in 5-7…

python regex nlp nltk spacy

asked Jul 16 '17 at 07:19

svfat

3,273
1
15
34

votes

5 answers

What is the difference between Luong attention and Bahdanau attention?

These two attentions are used in seq2seq modules. The two different attentions are introduced as multiplicative and additive attentions in this TensorFlow documentation. What is the difference?

tensorflow deep-learning nlp attention-model

asked May 29 '17 at 08:43

Shamane Siriwardhana

3,951
6
33
73

votes

3 answers

Python NLTK pos_tag not returning the correct part-of-speech tag

Having this: text = word_tokenize("The quick brown fox jumps over the lazy dog") And running: nltk.pos_tag(text) I get: [('The', 'DT'), ('quick', 'NN'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN'), ('the', 'DT'), ('lazy',…

python machine-learning nlp nltk pos-tagger

asked Jun 13 '15 at 16:52

faceoff

votes

3 answers

Stemming algorithm that produces real words

I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities I've used an…

php nlp stemming snowball porter-stemmer

asked Oct 10 '08 at 10:43

Dave

votes

5 answers

Convert words between verb/noun/adjective forms

i would like a python library function that translates/converts across different parts of speech. sometimes it should output multiple words (e.g. "coder" and "code" are both nouns from the verb "to code", one's the subject the other's the object) #…

python nlp nltk wordnet

asked Jan 23 '13 at 21:01

sam boosalis

1,997
4
20
32

votes

7 answers

What are the available tools to summarize or simplify text?

Is there any library, preferably in python but at least open source, that can summarize and or simplify natural-language text?

python nlp text-processing

asked Mar 29 '11 at 21:46

captainandcoke

1,085
2
13
16

votes

3 answers

How to interpret scikit's learn confusion matrix and classification report?

I have a sentiment analysis task, for this Im using this corpus the opinions have 5 classes (very neg, neg, neu, pos, very pos), from 1 to 5. So I do the classification as follows: from sklearn.feature_extraction.text import TfidfVectorizer import…

machine-learning nlp scikit-learn svm confusion-matrix

asked Jun 10 '15 at 03:12

john doe

2,233
7
37
58

votes

7 answers

Computing precision and recall in Named Entity Recognition

Now I am about to report the results from Named Entity Recognition. One thing that I find a bit confusing is that my understanding of precision and recall was that one simply sums up true positives, true negatives, false positives and false…

nlp precision-recall

asked Nov 23 '09 at 15:00

Nick

2,924
4
36
43

votes

6 answers

FreqDist with NLTK

The Python package nltk has the FreqDist function which gives you the frequency of words within a text. I am trying to pass my text as an argument but the result is of the form: [' ', 'e', 'a', 'o', 'n', 'i', 't', 'r', 's', 'l', 'd', 'h', 'c', 'y',…

python nlp nltk

asked Jan 08 '11 at 16:12

afg102

votes

1 answer

Applying Spacy Parser to Pandas DataFrame w/ Multiprocessing

Say I have a dataset, like iris = pd.DataFrame(sns.load_dataset('iris')) I can use Spacy and .apply to parse a string column into tokens (my real dataset has >1 word/token per entry of course) import spacy # (I have version 1.8.2) nlp =…

python nlp multiprocessing spacy

asked Jun 06 '17 at 16:50

Max Power

8,265
13
50
91

votes

4 answers

Python - How to intuit word from abbreviated text using NLP?

I was recently working on a data set that used abbreviations for various words. For example, wtrbtl = water bottle bwlingbl = bowling ball bsktball = basketball There did not seem to be any consistency in terms of the convention used, i.e.…

python machine-learning nlp abbreviation

asked Apr 20 '17 at 05:17

Dan Temkin

1,565
1
14
18

votes

2 answers

How is the Vader 'compound' polarity score calculated in Python NLTK?

I'm using the Vader SentimentAnalyzer to obtain the polarity scores. I used the probability scores for positive/negative/neutral before, but I just realized the "compound" score, ranging from -1 (most neg) to 1 (most pos) would provide a single…

python nlp nltk sentiment-analysis vader

asked Oct 30 '16 at 04:15

alicecongcong

votes

6 answers

What’s a good Python profanity filter library?

Like https://stackoverflow.com/questions/1521646/best-profanity-filter, but for Python — and I’m looking for libraries I can run and control myself locally, as opposed to web services. (And whilst it’s always great to hear your fundamental…

python nlp profanity

asked Aug 20 '10 at 14:20

Paul D. Waite

96,640
56
199
270

votes

3 answers

Classifying Documents into Categories

I've got about 300k documents stored in a Postgres database that are tagged with topic categories (there are about 150 categories in total). I have another 150k documents that don't yet have categories. I'm trying to find the best way to…

python machine-learning nlp nltk naivebayes

asked Jun 24 '10 at 19:56

erikcw

10,787
15
58
75

Prev 1 2 3

…

99 100 Next