Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions

votes

1 answer

Create a custom Transformer in PySpark ML

I am new to Spark SQL DataFrames and ML on them (PySpark). How can I create a custom tokenizer, which for example removes stop words and uses some libraries from nltk? Can I extend the default one?

asked Sep 01 '15 at 12:36

Niko

votes

3 answers

Large scale machine learning - Python or Java?

I am currently embarking on a project that will involve crawling and processing huge amounts of data (hundreds of gigs), and also mining them for extracting structured data, named entity recognition, deduplication, classification etc. I'm familiar…

java python machine-learning nltk mahout

asked Mar 15 '12 at 13:41

jeffreyveon

13,400
18
79
129

votes

6 answers

FreqDist with NLTK

The Python package nltk has the FreqDist function which gives you the frequency of words within a text. I am trying to pass my text as an argument but the result is of the form: [' ', 'e', 'a', 'o', 'n', 'i', 't', 'r', 's', 'l', 'd', 'h', 'c', 'y',…

python nlp nltk

asked Jan 08 '11 at 16:12

afg102

votes

2 answers

How is the Vader 'compound' polarity score calculated in Python NLTK?

I'm using the Vader SentimentAnalyzer to obtain the polarity scores. I used the probability scores for positive/negative/neutral before, but I just realized the "compound" score, ranging from -1 (most neg) to 1 (most pos) would provide a single…

python nlp nltk sentiment-analysis vader

asked Oct 30 '16 at 04:15

alicecongcong

votes

3 answers

Classifying Documents into Categories

I've got about 300k documents stored in a Postgres database that are tagged with topic categories (there are about 150 categories in total). I have another 150k documents that don't yet have categories. I'm trying to find the best way to…

python machine-learning nlp nltk naivebayes

asked Jun 24 '10 at 19:56

erikcw

10,787
15
58
75

votes

7 answers

NLTK vs Stanford NLP

I have recently started to use NLTK toolkit for creating few solutions using Python. I hear a lot of community activity regarding using Stanford NLP. Can anyone tell me the difference between NLTK and Stanford NLP? Are they two different libraries?…

python nlp nltk stanford-nlp

asked Oct 13 '16 at 03:36

RData

votes

4 answers

What does NN VBD IN DT NNS RB means in NLTK?

when I chunk text, I get lots of codes in the output like NN, VBD, IN, DT, NNS, RB. Is there a list documented somewhere which tells me the meaning of these? I have tried googling nltk chunk code nltk chunk grammar nltk chunk tokens. But I am not…

python nlp nltk text-parsing pos-tagger

asked Mar 29 '15 at 18:08

Knows Not Much

30,395
60
197
373

votes

8 answers

Python can't find module NLTK

I followed these instructions http://www.nltk.org/install.html to install nltk module on my mac (10.6) I have installed python 2.7, but when I open IDLE and type import nltk it gives me this error Traceback (most recent call last): File…

python macos python-2.7 pip nltk

asked Jan 14 '15 at 16:09

Foxsquirrel

votes

10 answers

Forming Bigrams of words in list of sentences with Python

I have a list of sentences: text = ['cant railway station','citadel hotel',' police stn']. I need to form bigram pairs and store them in a variable. The problem is that when I do that, I get a pair of sentences instead of words. Here is what I…

python list list-comprehension nltk collocation

asked Feb 18 '14 at 04:41

Hypothetical Ninja

3,920
13
49
75

votes

8 answers

Computing N Grams using Python

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like: "Cystic fibrosis affects 30,000 children and young adults in the US alone Inhaling the mists of salt water can reduce the pus and infection that fills the…

python nlp nltk n-gram

asked Nov 16 '12 at 20:26

gran_profaci

8,087
15
66
99

votes

7 answers

What is NLTK POS tagger asking me to download?

I just started using a part-of-speech tagger, and I am facing many problems. I started POS tagging with the following: import nltk text=nltk.word_tokenize("We are going out.Just you and me.") When I want to print 'text', the following…

python nlp nltk pos-tagger

asked Dec 21 '11 at 13:14

Pearl

votes

7 answers

Change nltk.download() path directory from default ~/ntlk_data

I was trying to download/update python nltk packages on a computing server and it returned this [Errno 122] Disk quota exceeded: error. Specifically: [nltk_data] Downloading package stop words to /home/sh2264/nltk_data... [nltk_data] Error…

python python-2.7 path nltk default

asked Jul 01 '17 at 04:42

shenglih

votes

2 answers

object of type 'generator' has no len()

I have just started to learn python. I want to write a program in NLTK that breaks a text into unigrams, bigrams. For example if the input text is... "I am feeling sad and disappointed due to errors" ... my function should generate text like: I…

python nltk

asked Apr 28 '16 at 11:39

Vishal Kharde

1,553
3
16
34

votes

2 answers

How do I test whether an nltk resource is already installed on the machine running my code?

I just started my first NLTK project and am confused about the proper setup. I need several resources like the Punkt Tokenizer and the maxent pos tagger. I myself downloaded them using the GUI nltk.download(). For my collaborators I of course want…

python nlp nltk

asked May 16 '14 at 21:03

Zakum

2,157
2
22
30

votes

3 answers

Topic distribution: How do we see which document belong to which topic after doing LDA in python

I am able to run the LDA code from gensim and got the top 10 topics with their respective keywords. Now I would like to go a step further to see how accurate the LDA algo is by seeing which document they cluster into each topic. Is this possible in…

python nltk lda gensim

asked Jan 08 '14 at 00:30

jxn

7,685
28
90
172

Prev 1 2 3

…

99 100 Next