Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions

votes

4 answers

counting n-gram frequency in python nltk

I have the following code. I know that I can use apply_freq_filter function to filter out collocations that are less than a frequency count. However, I don't know how to get the frequencies of all the n-gram tuples (in my case bi-gram) in a…

python nltk n-gram

asked Jan 16 '13 at 18:00

Rkz

1,237
5
16
30

votes

3 answers

Selecting the most fluent text from a set of possibilities via grammar checking (Python)

Some background I am a literature student at New College of Florida, currently working on an overly ambitious creative project. The project is geared towards the algorithmic generation of poetry. It's written in Python. My Python knowledge and…

python nlp grammar nltk linguistics

asked Jan 12 '12 at 21:44

floer32

2,190
4
29
50

votes

5 answers

How to use the a k-fold cross validation in scikit with naive bayes classifier and NLTK

I have a small corpus and I want to calculate the accuracy of naive Bayes classifier using 10-fold cross validation, how can do it.

python scikit-learn nltk cross-validation naivebayes

asked May 04 '13 at 21:50

user2284345

votes

4 answers

custom tagging with nltk

I'm trying to create a small english-like language for specifying tasks. The basic idea is to split a statement into verbs and noun-phrases that those verbs should apply to. I'm working with nltk but not getting the results i'd hoped for, eg: >>>…

python nltk

asked May 07 '11 at 05:36

SpliFF

38,186
16
91
120

votes

3 answers

Python NLTK: How to tag sentences with the simplified set of part-of-speech tags?

Chapter 5 of the Python NLTK book gives this example of tagging words in a sentence: >>> text = nltk.word_tokenize("And now for something completely different") >>> nltk.pos_tag(text) [('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something',…

python tagging nltk

asked Apr 26 '11 at 08:19

Ollie Glass

19,455
21
76
107

votes

5 answers

Lemmatize French text

I have some text in French that I need to process in some ways. For that, I need to: First, tokenize the text into words Then lemmatize those words to avoid processing the same root more than once As far as I can see, the wordnet lemmatizer in the…

python nltk lemmatization

asked Oct 29 '12 at 23:27

yelsayed

5,236
3
27
38

votes

3 answers

Topic Modelling in MALLET vs NLTK

I just read a fascinating article about how MALLET could be used for topic modelling, but I couldn't find anything online comparing MALLET to NLTK, which I've already had some experience with. What are the main differences between them? Is MALLET a…

nltk mallet

asked Sep 19 '11 at 19:24

Trindaz

17,029
21
82
111

votes

6 answers

POS tagging in German

I am using NLTK to extract nouns from a text-string starting with the following command: tagged_text = nltk.pos_tag(nltk.Text(nltk.word_tokenize(some_string))) It works fine in English. Is there an easy way to make it work for German as well? (I…

python nlp nltk

asked Oct 28 '09 at 20:17

Johannes Meier

votes

1 answer

Understanding NLTK collocation scoring for bigrams and trigrams

Background: I am trying to compare pairs of words to see which pair is "more likely to occur" in US English than another pair. My plan is/was to use the collocation facilities in NLTK to score word pairs, with the higher scoring pair being the most…

python nlp nltk

asked Dec 30 '11 at 20:09

ccgillett

4,511
4
21
14

votes

1 answer

Parsing city of origin / destination city from a string

I have a pandas dataframe where one column is a bunch of strings with certain travel details. My goal is to parse each string to extract the city of origin and destination city (I would like to ultimately have two new columns titled 'origin' and…

python regex pandas nlp nltk

asked Jan 28 '20 at 20:39

Merv Merzoug

1,149
2
19
33

votes

3 answers

Combining a Tokenizer into a Grammar and Parser with NLTK

I am making my way through the NLTK book and I can't seem to do something that would appear to be a natural first step for building a decent grammar. My goal is to build a grammar for a particular text corpus. (Initial question: Should I even try…

python nlp grammar nltk

asked Feb 01 '11 at 03:06

speedplane

15,673
16
86
138

votes

6 answers

Word sense disambiguation in NLTK Python

I am new to NLTK Python and i am looking for some sample application which can do word sense disambiguation. I have got a lot of algorithms in search results but not a sample application. I just want to pass a sentence and want to know the sense of…

python nltk

asked Sep 13 '10 at 11:04

thesensemakers

votes

6 answers

Generating Ngrams (Unigrams,Bigrams etc) from a large corpus of .txt files and their Frequency

I need to write a program in NLTK that breaks a corpus (a large collection of txt files) into unigrams, bigrams, trigrams, fourgrams and fivegrams. I have already written code to input my files into the program. The input is 300 .txt files written…

python nltk

asked Sep 07 '15 at 15:02

Arash

votes

6 answers

Extract list of Persons and Organizations using Stanford NER Tagger in NLTK

I am trying to extract list of persons and organizations using Stanford Named Entity Recognizer (NER) in Python NLTK. When I run: from nltk.tag.stanford import NERTagger st =…

python nltk stanford-nlp named-entity-recognition

asked Jun 05 '15 at 10:49

user1680859

1,160
2
24
40

votes

3 answers

NLTK for Named Entity Recognition

I am trying to use NLTK toolkit to get extract place, date and time from text messages. I just installed the toolkit on my machine and I wrote this quick snippet to test it out: sentence = "Let's meet tomorrow at 9 pm"; tokens =…

machine-learning nlp nltk text-processing named-entity-recognition

asked Oct 11 '13 at 07:29

Darth.Vader

5,079
7
50
90

Prev 1 2 3

…

99 100 Next