Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions
99
votes
18 answers

How to use Stanford Parser in NLTK using Python

Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)
ThanaDaray
  • 1,693
  • 5
  • 22
  • 28
90
votes
4 answers

Creating a new corpus with NLTK

I reckoned that often the answer to my title is to go and read the documentations, but I ran through the NLTK book but it doesn't give the answer. I'm kind of new to Python. I have a bunch of .txt files and I want to be able to use the corpus…
alvas
  • 115,346
  • 109
  • 446
  • 738
89
votes
12 answers

Corpora/stopwords not found when import nltk library

I trying to import the nltk package in python 2.7 import nltk stopwords = nltk.corpus.stopwords.words('english') print(stopwords[:10]) Running this gives me the following error: LookupError:…
Frits Verstraten
  • 2,049
  • 7
  • 22
  • 41
88
votes
17 answers

NLTK download SSL: Certificate verify failed

I get the following error when trying to install Punkt for nltk: nltk.download('punkt') [nltk_data] Error loading Punkt:
user3429986
  • 1,035
  • 1
  • 7
  • 8
87
votes
16 answers

Ordinal numbers replacement

I am currently looking for the way to replace words like first, second, third,...with appropriate ordinal number representation (1st, 2nd, 3rd). I have been googling for the last week and I didn't find any useful standard tool or any function from…
skornos
  • 3,121
  • 1
  • 26
  • 30
79
votes
6 answers

Stopword removal with NLTK

I am trying to process a user entered text by removing stopwords using nltk toolkit, but with stopword-removal the words like 'and', 'or', 'not' gets removed. I want these words to be present after stopword removal process as they are operators…
Grahesh Parkar
  • 1,017
  • 1
  • 13
  • 16
78
votes
3 answers

Practical examples of NLTK use

I'm playing around with the Natural Language Toolkit (NLTK). Its documentation (Book and HOWTO) are quite bulky and the examples are sometimes slightly advanced. Are there any good but basic examples of uses/applications of NLTK? I'm thinking of…
Mat
  • 82,161
  • 34
  • 89
  • 109
78
votes
4 answers

SSL error downloading NLTK data

I am trying to download NLTK 3.0 for use with Python 3.6 on Mac OS X 10.7.5, but am getting an SSL error: import nltk nltk.download() I downloaded NLTK with a pip3 command: sudo pip3 install -U nltk. Changing the index in the NLTK downloader…
DyingIsFun
  • 1,227
  • 2
  • 10
  • 21
77
votes
8 answers

wordnet lemmatization and pos tagging in python

I wanted to use wordnet lemmatizer in python and I have learnt that the default pos tag is NOUN and that it does not output the correct lemma for a verb, unless the pos tag is explicitly specified as VERB. My question is what is the best shot…
user1946217
  • 1,733
  • 6
  • 31
  • 40
76
votes
1 answer

Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP)

I am playing around with NLTK to do an assignment on sentiment analysis. I am using Python 2.7. NLTK 3.0 and NumPy1.9.1 version. This is the code : __author__ = 'karan' import nltk import re import sys def main(): print("Start"); #…
rkbom9
  • 913
  • 3
  • 9
  • 17
75
votes
8 answers

English grammar for parsing in NLTK

Is there a ready-to-use English grammar that I can just load it and use in NLTK? I've searched around examples of parsing with NLTK, but it seems like that I have to manually specify grammar before parsing a sentence. Thanks a lot!
roboren
  • 891
  • 1
  • 7
  • 5
72
votes
4 answers

How to extract common / significant phrases from a series of text entries

I have a series of text items- raw HTML from a MySQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word matching). My example is any review on Yelp.com,…
arronsky
  • 721
  • 1
  • 6
  • 3
69
votes
15 answers

How do I download NLTK data?

Updated answer:NLTK works for 2.7 well. I had 3.2. I uninstalled 3.2 and installed 2.7. Now it works!! I have installed NLTK and tried to download NLTK Data. What I did was to follow the instrution on this site: http://www.nltk.org/data.html I…
Q-ximi
  • 941
  • 3
  • 14
  • 21
68
votes
7 answers

NLTK and Stopwords Fail #lookuperror

I am trying to start a project of sentiment analysis and I will use the stop words method. I made some research and I found that nltk have stopwords but when I execute the command there is an error. What I do is the following, in order to know which…
Facundo
  • 729
  • 2
  • 6
  • 7
66
votes
2 answers

How do I tokenize a string sentence in NLTK?

I am using nltk, so I want to create my own custom texts just like the default ones on nltk.books. However, I've just got up to the method like my_text = ['This', 'is', 'my', 'text'] I'd like to discover any way to input my "text" as: my_text =…
diegoaguilar
  • 8,179
  • 14
  • 80
  • 129