Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions

votes

7 answers

Unable to install nltk on Mac OS El Capitan

I did sudo pip install -U nltk as suggested by the nltk documentation. However, I am getting the following output: Collecting nltk Downloading nltk-3.0.5.tar.gz (1.0MB) 100% |████████████████████████████████| 1.0MB 516kB/s Collecting…

python python-2.7 nltk

asked Oct 01 '15 at 23:57

proutray

1,943
3
30
48

votes

14 answers

Resource 'corpora/wordnet' not found on Heroku

I'm trying to get NLTK and wordnet working on Heroku. I've already done heroku run python nltk.download() wordnet pip install -r requirements.txt But I get this error: Resource 'corpora/wordnet' not found. Please use the NLTK Downloader to…

python django heroku nltk wordnet

asked Dec 20 '12 at 05:21

user1881006

votes

7 answers

Determine if text is in English?

I am using both Nltk and Scikit Learn to do some text processing. However, within my list of documents I have some documents that are not in English. For example, the following could be true: [ "this is some text written in English", "this is…

python scikit-learn nlp nltk

asked Apr 12 '17 at 18:41

ocean800

3,489
13
41
73

votes

3 answers

Generate bigrams with NLTK

I am trying to produce a bigram list of a given sentence for example, if I type, To be or not to be I want the program to generate to be, be or, or not, not to, to be I tried the following code but just gives me

python nltk n-gram

asked Jun 06 '16 at 06:44

Nikhil Raghavendra

1,570
5
18
25

votes

7 answers

NLTK Named Entity recognition to a Python list

I used NLTK's ne_chunk to extract named entities from a text: my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the…

python nlp nltk named-entity-recognition

asked Aug 05 '15 at 14:58

Zlo

1,150
2
18
38

votes

5 answers

Determining tense of a sentence Python

Following several other posts, [e.g. Detect English verb tenses using NLTK , Identifying verb tenses in python, Python NLTK figure out tense ] I wrote the following code to determine tense of a sentence in Python using POS tagging: from nltk import…

python nlp nltk

asked May 03 '15 at 17:16

kyrenia

5,431
9
63
93

votes

4 answers

Python NLTK: Bigrams trigrams fourgrams

I have this example and i want to know how to get this result. I have text and I tokenize it then I collect the bigram and trigram and fourgram like that import nltk from nltk import word_tokenize from nltk.util import ngrams text = "Hi How are…

python nltk n-gram

asked Jun 22 '14 at 00:16

M.A.Hassan

votes

4 answers

How to navigate a nltk.tree.Tree?

I've chunked a sentence using: grammar = ''' NP: …

tree nltk

asked Feb 12 '13 at 21:17

Roy Smith

2,039
3
20
27

votes

4 answers

Tokenization of Arabic words using NLTK

I'm using NLTK word_tokenizer to split a sentence into words. I want to tokenize this sentence: في_بيتنا كل شي لما تحتاجه يضيع ...ادور على شاحن فجأة يختفي ..لدرجة اني اسوي نفسي ادور شيء The code I'm writing is: import re import nltk lex = u"…

python tokenize nltk

asked Oct 23 '12 at 16:59

Hady Elsahar

2,121
4
29
47

votes

1 answer

pronoun resolution backwards

The usual coreference resolution works in the following way: Provided The man likes math. He really does. it figures out that he refers to the man. There are plenty of tools to do this. However, is there a way to do it backwards? For…

python nlp nltk stanford-nlp

asked Jan 06 '16 at 08:00

ytrewq

3,670
9
42
71

votes

1 answer

Combining text stemming and removal of punctuation in NLTK and scikit-learn

I am using a combination of NLTK and scikit-learn's CountVectorizer for stemming words and tokenization. Below is an example of the plain usage of the CountVectorizer: from sklearn.feature_extraction.text import CountVectorizer vocab = ['The…

python text scikit-learn nltk

asked Sep 30 '14 at 17:14

user2489252

votes

3 answers

Implementing Bag-of-Words Naive-Bayes classifier in NLTK

I basically have the same question as this guy.. The example in the NLTK book for the Naive Bayes classifier considers only whether a word occurs in a document as a feature.. it doesn't consider the frequency of the words as the feature to look at…

python machine-learning nlp nltk naivebayes

asked Apr 11 '12 at 01:00

Ben G

26,091
34
103
170

votes

3 answers

Are there any classes in NLTK for text normalizing and canonizing?

The prevalent amount of NLTK documentation and examples is devoted to lemmatization and stemming but is very sparse on such matters of normalization as: converting all letters to lower or upper case removing punctuation converting numbers into…

python nltk

asked Feb 10 '12 at 12:08

soshial

5,906
6
32
40

votes

10 answers

Adding words to nltk stoplist

I have some code that removes stop words from my data set, as the stop list doesn't seem to remove a majority of the words I would like it too, I'm looking to add words to this stop list so that it will remove them for this case. The code i'm using…

python nltk stop-words

asked Apr 01 '11 at 09:49

Alex

1,853
5
16
15

votes

7 answers

Efficient Context-Free Grammar parser, preferably Python-friendly

I am in need of parsing a small subset of English for one of my project, described as a context-free grammar with (1-level) feature structures (example) and I need to do it efficiently . Right now I'm using NLTK's parser which produces the right…

python parsing nlp grammar nltk

asked Dec 28 '10 at 01:06

Max Shawabkeh

37,799
10
82
91

Prev 1 2 3

…

99 100 Next