Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions
45
votes
7 answers

What is the best stemming method in Python?

I tried all the nltk methods for stemming but it gives me weird results with some words. Examples It often cut end of words when it shouldn't do it : poodle => poodl article articl or doesn't stem very good : easily and easy are not stemmed in…
PeYoTlL
  • 3,144
  • 2
  • 17
  • 18
43
votes
4 answers

Use of PunktSentenceTokenizer in NLTK

I am learning Natural Language Processing using NLTK. I came across the code using PunktSentenceTokenizer whose actual use I cannot understand in the given code. The code is given : import nltk from nltk.corpus import state_union from nltk.tokenize…
arqam
  • 3,582
  • 5
  • 34
  • 69
43
votes
10 answers

Python Untokenize a sentence

There are so many guides on how to tokenize a sentence, but i didn't find any on how to do the opposite. import nltk words = nltk.word_tokenize("I've found a medicine for my disease.") result I get is: ['I', "'ve", 'found', 'a', 'medicine',…
Brana
  • 1,197
  • 3
  • 17
  • 38
41
votes
5 answers

NLTK and language detection

How do I detect what language a text is written in using NLTK? The examples I've seen use nltk.detect, but when I've installed it on my mac, I cannot find this package.
niklassaers
  • 8,480
  • 20
  • 99
  • 146
41
votes
4 answers

NLTK WordNet Lemmatizer: Shouldn't it lemmatize all inflections of a word?

I'm using the NLTK WordNet Lemmatizer for a Part-of-Speech tagging project by first modifying each word in the training corpus to its stem (in place modification), and then training only on the new corpus. However, I found that the lemmatizer is not…
sanjeev mk
  • 4,276
  • 6
  • 44
  • 69
40
votes
8 answers

How to get synonyms from nltk WordNet Python

WordNet is great, but I'm having a hard time getting synonyms in nltk. If you search similar to for the word 'small' like here, it shows all of the synonyms. Basically I just need to know the following: wn.synsets('word')[i].option() Where option…
user2758113
  • 1,001
  • 1
  • 13
  • 25
40
votes
4 answers

How to tweak the NLTK sentence tokenizer

I'm using NLTK to analyze a few classic texts and I'm running in to trouble tokenizing the text by sentence. For example, here's what I get for a snippet from Moby Dick: import nltk sent_tokenize =…
Chris Wilson
  • 6,599
  • 8
  • 35
  • 71
39
votes
7 answers

How do I do dependency parsing in NLTK?

Going through the NLTK book, it's not clear how to generate a dependency tree from a given sentence. The relevant section of the book: sub-chapter on dependency grammar gives an example figure but it doesn't show how to parse a sentence to come up…
MrD
  • 2,405
  • 3
  • 22
  • 23
39
votes
6 answers

How to use spacy's lemmatizer to get a word into basic form

I am new to spacy and I want to use its lemmatizer function, but I don't know how to use it, like I into strings of word, which will return the string with the basic form the words. Examples: 'words'=> 'word' 'did' => 'do' Thank you.
yi wang
  • 403
  • 1
  • 4
  • 8
37
votes
2 answers

Is there a corpus of English words in nltk?

Is there any way to get the list of English words in python nltk library? I tried to find it but the only thing I have found is wordnet from nltk.corpus. But based on documentation, it does not have what I need (it finds synonyms for a word). I know…
Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
37
votes
3 answers

How do I create my own NLTK text from a text file?

I'm a Literature grad student, and I've been going through the O'Reilly book in Natural Language Processing (nltk.org/book). It looks incredibly useful. I've played around with all the example texts and example tasks in Chapter 1, like concordances.…
Jonathan
  • 10,571
  • 13
  • 67
  • 103
36
votes
2 answers

How to extract numbers (along with comparison adjectives or ranges)

I am working on two NLP projects in Python, and both have a similar task to extract numerical values and comparison operators from sentences, like the following: "... greater than $10 ... ", "... weight not more than 200lbs ...", "... height in 5-7…
svfat
  • 3,273
  • 1
  • 15
  • 34
36
votes
3 answers

Python NLTK pos_tag not returning the correct part-of-speech tag

Having this: text = word_tokenize("The quick brown fox jumps over the lazy dog") And running: nltk.pos_tag(text) I get: [('The', 'DT'), ('quick', 'NN'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN'), ('the', 'DT'), ('lazy',…
faceoff
  • 901
  • 3
  • 11
  • 16
36
votes
2 answers

Finding Proper Nouns using NLTK WordNet

Is there any way to find proper nouns using NLTK WordNet?Ie., Can i tag Possessive nouns using nltk Wordnet ?
Backue
  • 417
  • 1
  • 5
  • 8
36
votes
5 answers

Convert words between verb/noun/adjective forms

i would like a python library function that translates/converts across different parts of speech. sometimes it should output multiple words (e.g. "coder" and "code" are both nouns from the verb "to code", one's the subject the other's the object) #…
sam boosalis
  • 1,997
  • 4
  • 20
  • 32