Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

1 answer

How to detect if two news articles have the same topic? (Python semantic similarity)

I'm trying to scrape headlines and body text from articles on a few specific sites, similar to what Google does with Google News. The problem is that across different sites, they may have articles on the same subject worded slightly differently. Can…

python nlp comparison similarity

asked Apr 05 '10 at 18:52

resopollution

19,600
10
40
49

votes

1 answer

Diminutive words stemming / lemmatization

Currently I use 'lucene' and 'elasticsearch', and have next problem. I need get stemmed form or lemma for diminutive word. For instance : doggy -> dog kitty -> cat etc. But I get next results : doggy -> doggi kitty -> kitti Is there any way…

java lucene elasticsearch nlp morphological-analysis

asked Sep 09 '14 at 09:33

Ivan Kurchenko

4,043
1
11
28

votes

7 answers

Constructing human readable sentences based on a survey

The following is a survey given to course attendees to assess an instructor at the end of the course. Communication Skills 1. The instructor communicated course material clearly and accurately. Yes No 2. The instructor explained course objectives…

java parsing nlp semantics

asked Mar 27 '10 at 05:46

Joe

14,513
28
82
144

votes

1 answer

Python NLP: TypeError: not all arguments converted during string formatting

I tried the code on "Natural language processing with python", but a type error occurred. import nltk from nltk.corpus import brown suffix_fdist = nltk.FreqDist() for word in brown.words(): word = word.lower() suffix_fdist.inc(word[-1:]) …

python nlp typeerror

asked Aug 12 '14 at 04:30

allenwang

votes

2 answers

Updating the feature names into scikit TFIdfVectorizer

I am trying out this code from sklearn.feature_extraction.text import TfidfVectorizer import numpy as np train_data = ["football is the sport","gravity is the movie", "education is imporatant"] vectorizer = TfidfVectorizer(sublinear_tf=True,…

python machine-learning nlp scikit-learn

asked Aug 06 '14 at 07:07

Gunjan

2,775
27
30

votes

4 answers

Defining the context of a word - Python

I think this is an interesting question, at least for me. I have a list of words, let's say: photo, free, search, image, css3, css, tutorials, webdesign, tutorial, google, china, censorship, politics, internet and I have a list of…

python django dictionary nlp

asked Mar 23 '10 at 14:37

RadiantHex

24,907
47
148
244

votes

1 answer

Can I control the way the CountVectorizer vectorizes the corpus in scikit learn?

I am working with a CountVectorizer from scikit learn, and I'm possibly attempting to do some things that the object was not made for...but I'm not sure. In terms of getting counts for occurrence: vocabulary = ['hi', 'bye', 'run away!'] corpus =…

python nlp scikit-learn text-parsing corpus

asked Jun 03 '14 at 05:36

tumultous_rooster

12,150
32
92
149

votes

5 answers

How to install and invoke Stanford NERTagger?

I am trying to use NLTK interface for Stanford NER in the python enviornment, nltk.tag.stanford.NERTagger. from nltk.tag.stanford import NERTagger st = NERTagger('/usr/share/stanford-ner/classifiers/all.3class.distsim.crf.ser.gz', …

python nlp nltk stanford-nlp

asked May 26 '14 at 00:51

Hans

1,269
3
19
38

votes

3 answers

How to create the negative of a sentence in nltk

I am new to NLTK. I would like to create the negative of a sentence (which will usually be in the present tense). For example, is there a function to allow me to convert: 'I run' to 'I do not run' or 'She runs' to 'She does not run'. I suppose I…

python nlp nltk

asked May 13 '14 at 14:56

Sebastian Zeki

6,690
11
60
125

votes

1 answer

Understanding LDA Transformed Corpus in Gensim

I tried to examine the contents of the BOW corpus vs. the LDA[BOW Corpus] (transformed by LDA model trained on that corpus with, say, 35 topics) I found the following output: DOC 1 : [(1522, 1), (2028, 1), (2082, 1), (6202, 1)] LDA 1 : [(29,…

python nlp lda gensim

asked May 07 '14 at 05:48

Ravi Karan

votes

1 answer

Text mining, fact extraction, semantic analysis using .Net

I'm looking for any free tools/components/libraries that allow me to take anvantage of text mining, fact extraction and semantic analysis in my .NET application. The GATE project is what I need but it is written in Java. Is there something like…

.net nlp text-mining semantic-analysis

asked Feb 26 '10 at 21:55

Freak Wild Cowhunter

votes

1 answer

Stanford CoreNLP sentiment

I'm trying to implement the coreNLP sentiment analyzer in eclipse. Getting the error: Unable to resolve "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz" As either class path, filename or URL. I installed all of the NLP files using maven so I…

java dependencies nlp stanford-nlp

asked Apr 14 '14 at 15:24

English Grad

1,365
5
21
40

votes

2 answers

Exactly replicating R text preprocessing in python

I would like to preprocess a corpus of documents using Python in the same way that I can in R. For example, given an initial corpus, corpus, I would like to end up with a preprocessed corpus that corresponds to the one produced using the following R…

python r nlp analytics scikit-learn

asked Apr 01 '14 at 21:38

orome

45,163
57
202
418

votes

2 answers

Extracting information from unstructured text

I have a collection of "articles", each 1 to 10 sentences long, written in a noisy, informal english (i.e. social media style). I need to extract some information from each article, where available, like date and time. I also need to understand what…

nlp nltk

asked Mar 25 '14 at 17:59

Trasplazio Garzuglio

3,535
2
25
25

votes

1 answer

How to extract character ngram from sentences? - python

The following word2ngrams function extracts character 3grams from a word: >>> x = 'foobar' >>> n = 3 >>> [x[i:i+n] for i in range(len(x)-n+1)] ['foo', 'oob', 'oba', 'bar'] This post shows the character ngrams extraction for a single word, Quick…

python regex string nlp n-gram

asked Mar 15 '14 at 18:32

alvas

115,346
109
446
738

Prev 1 2 3

…

99 100 Next