Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions
31
votes
4 answers

counting n-gram frequency in python nltk

I have the following code. I know that I can use apply_freq_filter function to filter out collocations that are less than a frequency count. However, I don't know how to get the frequencies of all the n-gram tuples (in my case bi-gram) in a…
Rkz
  • 1,237
  • 5
  • 16
  • 30
30
votes
3 answers

Selecting the most fluent text from a set of possibilities via grammar checking (Python)

Some background I am a literature student at New College of Florida, currently working on an overly ambitious creative project. The project is geared towards the algorithmic generation of poetry. It's written in Python. My Python knowledge and…
floer32
  • 2,190
  • 4
  • 29
  • 50
30
votes
5 answers

How to use the a k-fold cross validation in scikit with naive bayes classifier and NLTK

I have a small corpus and I want to calculate the accuracy of naive Bayes classifier using 10-fold cross validation, how can do it.
user2284345
  • 501
  • 2
  • 5
  • 9
29
votes
4 answers

custom tagging with nltk

I'm trying to create a small english-like language for specifying tasks. The basic idea is to split a statement into verbs and noun-phrases that those verbs should apply to. I'm working with nltk but not getting the results i'd hoped for, eg: >>>…
SpliFF
  • 38,186
  • 16
  • 91
  • 120
29
votes
3 answers

Python NLTK: How to tag sentences with the simplified set of part-of-speech tags?

Chapter 5 of the Python NLTK book gives this example of tagging words in a sentence: >>> text = nltk.word_tokenize("And now for something completely different") >>> nltk.pos_tag(text) [('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something',…
Ollie Glass
  • 19,455
  • 21
  • 76
  • 107
29
votes
5 answers

Lemmatize French text

I have some text in French that I need to process in some ways. For that, I need to: First, tokenize the text into words Then lemmatize those words to avoid processing the same root more than once As far as I can see, the wordnet lemmatizer in the…
yelsayed
  • 5,236
  • 3
  • 27
  • 38
28
votes
3 answers

Topic Modelling in MALLET vs NLTK

I just read a fascinating article about how MALLET could be used for topic modelling, but I couldn't find anything online comparing MALLET to NLTK, which I've already had some experience with. What are the main differences between them? Is MALLET a…
Trindaz
  • 17,029
  • 21
  • 82
  • 111
28
votes
6 answers

POS tagging in German

I am using NLTK to extract nouns from a text-string starting with the following command: tagged_text = nltk.pos_tag(nltk.Text(nltk.word_tokenize(some_string))) It works fine in English. Is there an easy way to make it work for German as well? (I…
Johannes Meier
  • 285
  • 1
  • 3
  • 7
27
votes
1 answer

Understanding NLTK collocation scoring for bigrams and trigrams

Background: I am trying to compare pairs of words to see which pair is "more likely to occur" in US English than another pair. My plan is/was to use the collocation facilities in NLTK to score word pairs, with the higher scoring pair being the most…
ccgillett
  • 4,511
  • 4
  • 21
  • 14
27
votes
1 answer

Parsing city of origin / destination city from a string

I have a pandas dataframe where one column is a bunch of strings with certain travel details. My goal is to parse each string to extract the city of origin and destination city (I would like to ultimately have two new columns titled 'origin' and…
Merv Merzoug
  • 1,149
  • 2
  • 19
  • 33
27
votes
3 answers

Combining a Tokenizer into a Grammar and Parser with NLTK

I am making my way through the NLTK book and I can't seem to do something that would appear to be a natural first step for building a decent grammar. My goal is to build a grammar for a particular text corpus. (Initial question: Should I even try…
speedplane
  • 15,673
  • 16
  • 86
  • 138
27
votes
6 answers

Word sense disambiguation in NLTK Python

I am new to NLTK Python and i am looking for some sample application which can do word sense disambiguation. I have got a lot of algorithms in search results but not a sample application. I just want to pass a sentence and want to know the sense of…
thesensemakers
  • 309
  • 1
  • 5
  • 7
27
votes
6 answers

Generating Ngrams (Unigrams,Bigrams etc) from a large corpus of .txt files and their Frequency

I need to write a program in NLTK that breaks a corpus (a large collection of txt files) into unigrams, bigrams, trigrams, fourgrams and fivegrams. I have already written code to input my files into the program. The input is 300 .txt files written…
Arash
  • 295
  • 1
  • 3
  • 10
27
votes
6 answers

Extract list of Persons and Organizations using Stanford NER Tagger in NLTK

I am trying to extract list of persons and organizations using Stanford Named Entity Recognizer (NER) in Python NLTK. When I run: from nltk.tag.stanford import NERTagger st =…
user1680859
  • 1,160
  • 2
  • 24
  • 40
27
votes
3 answers

NLTK for Named Entity Recognition

I am trying to use NLTK toolkit to get extract place, date and time from text messages. I just installed the toolkit on my machine and I wrote this quick snippet to test it out: sentence = "Let's meet tomorrow at 9 pm"; tokens =…