Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions
23
votes
5 answers

What to download in order to make nltk.tokenize.word_tokenize work?

I am going to use nltk.tokenize.word_tokenize on a cluster where my account is very limited by space quota. At home, I downloaded all nltk resources by nltk.download() but, as I found out, it takes ~2.5GB. This seems a bit overkill to me. Could you…
petrbel
  • 2,428
  • 5
  • 29
  • 49
23
votes
4 answers

AttributeError: 'list' object has no attribute 'copy'

I have the following code snippet classifier = NaiveBayesClassifier.train(train_data) #classifier.show_most_informative_features(n=20) results = classifier.classify(test_data) and the error shows in the following line results =…
Amr Ragab
  • 437
  • 1
  • 4
  • 7
23
votes
5 answers

how to use word_tokenize in data frame

I have recently started using the nltk module for text analysis. I am stuck at a point. I want to use word_tokenize on a dataframe, so as to obtain all the words used in a particular row of the dataframe. data example: text 1. This is a…
eclairs
  • 1,515
  • 6
  • 21
  • 26
23
votes
6 answers

Text mining with PHP

I'm doing a project for a college class I'm taking. I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes…
garyc40
  • 343
  • 1
  • 3
  • 8
23
votes
3 answers

nltk NaiveBayesClassifier training for sentiment analysis

I am training the NaiveBayesClassifier in Python using sentences, and it gives me the error below. I do not understand what the error might be, and any help would be good. I have tried many other input formats, but the error remains. The code given…
student001
  • 533
  • 1
  • 7
  • 20
22
votes
14 answers

NLTK fails to find the Java executable

I am using NLTK's nltk.tag.stanford, which needs to call the java executable. I set JAVAHOME to C:\Program Files\Java\jdk1.6.0_25 where my jdk is installed, but when run the program I get the error "NLTK was unable to find the java executable! Use…
Thomas Chu
  • 221
  • 1
  • 2
  • 3
22
votes
8 answers

How do I find the frequency count of a word in English using WordNet?

Is there a way to find the frequency of the usage of a word in the English language using WordNet or NLTK using Python? NOTE: I do not want the frequency count of a word in a given input file. I want the frequency count of a word in general based on…
Apps
  • 529
  • 3
  • 8
  • 15
22
votes
1 answer

nltk wordpunct_tokenize vs word_tokenize

Does anyone know the difference between nltk's wordpunct_tokenize and word_tokenize? I'm using nltk=3.2.4 and there's nothing on the doc string of wordpunct_tokenize that explains the difference. I couldn't find this info either in the documentation…
tsando
  • 4,557
  • 2
  • 33
  • 35
22
votes
7 answers

NLTK - AttributeError: module 'nltk' has no attribute 'data'

I used nltk in my code for a few days, but now, when I try to import nltk, I get the error: File "C:\Users\Nada\Anaconda\lib\site-packages\nltk\corpus\reader\plaintext.py", line 42, in PlaintextCorpusReader…
user8451312
22
votes
6 answers

Extracting all Nouns from a text file using nltk

Is there a more efficient way of doing this? My code reads a text file and extracts all Nouns. import nltk File = open(fileName) #open file lines = File.read() #read all lines sentences = nltk.sent_tokenize(lines) #tokenize sentences nouns = []…
Rakesh Adhikesavan
  • 11,966
  • 18
  • 51
  • 76
22
votes
7 answers

How to identify the subject of a sentence?

Can Python + NLTK be used to identify the subject of a sentence? From what I have learned till now is that a sentence can be broken into a head and its dependents. For e.g. "I shot an elephant". In this sentence, I and elephant are dependents to…
singhalc
  • 343
  • 1
  • 2
  • 8
22
votes
6 answers

downloading error using nltk.download()

I am experimenting NLTK package using Python. I tried to downloaded NLTK using nltk.download(). I got this kind of error message. How to solve this problem? Thanks. The system I used is Ubuntu installed under VMware. The IDE is Spyder. After using…
user288609
  • 12,465
  • 26
  • 85
  • 127
22
votes
2 answers

What would cause WordNetCorpusReader to have no attribute LazyCorpusLoader?

I've got a short function to check whether a word is a real word by comparing it to the WordNet corpus from the Natural Language Toolkit. I'm calling this function from a thread that validates txt files. When I run my code, the first time the…
Cecilia
  • 4,512
  • 3
  • 32
  • 75
22
votes
4 answers

Semantic Role Labeling using NLTK

I have a list of sentences and I want to analyze every sentence and identify the semantic roles within that sentence. How do I do that? I came across the PropBankCorpusReader within NLTK module that adds semantic labeling information to the Penn…
Prahalad Deshpande
  • 4,709
  • 1
  • 20
  • 22
22
votes
1 answer

NLTK for Persian

How to use functions of NLTK for Persian? For example: 'concordance'. When I use 'concordance', the answer is 'not match', however there is the parameter of concordance in my text. the input is very simple .it contains of "hello سلام".when parameter…
ikj
  • 221
  • 2
  • 4