Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions
2
votes
1 answer

nltk word_tokenize in Pandas DataFrame only returns tokens for the first 101 words/tokens

I'm trying to apply word_tokenization to a Pandas DataFrame column as the step before POS tagging. The source/raw column is 'sent' (already sentence-tokenized) and the destination column is 'word'. Here's the code, including the max column width…
ddormer
  • 23
  • 3
2
votes
0 answers

NLTK inter-annotator agreement using Krippendorff Alpha Outputs Zero on only 1 disagreement

I have a sequence-labeling task in NLP, where the annotators are asked to provide 1 or more labels to each word in a sentence. For e.g., for a sentence [a, b, c, d] Ann 1. provided [[0, 1, 2], [0, 1], [0], [0]] Ann 2. provided [[0, 2], [0], [0], [0,…
2
votes
2 answers

How to solve NLTK LookupError(resource_not_found) ? It exists in path. (Python)

I try to use nltk library, but i got stuck. I downloaded the stopwords library by hand (I cant download by code because of the permission issues on my working machine), but it always gives me the following error; LookupError:…
Sevval Kahraman
  • 1,185
  • 3
  • 10
  • 37
2
votes
1 answer

Ngram in python with start_pad

i'm know in python i'm take some basic thing about list and tuple but my not full understand the my cod i want create list have three index in each index have tuple with tow index like this [('~','a'),('a','b'),('b','c')] the first index in tuple…
jawaher
  • 25
  • 6
2
votes
1 answer

Calculating PMI for bigram and discrepancy

Suppose I have the following text: text = "this is a foo bar bar black sheep foo bar bar black sheep foo bar bar black sheep shep bar bar black sentence" I can calculate the PMI for bigram using NLTK as follow: bigram_measures =…
Wiliam
  • 1,078
  • 10
  • 21
2
votes
0 answers

NLTK word_tokenize returning empty

I am trying to tokenize the words and sentences in a text document but it is returing empty for both. Could you please check and share why am I seeing this? Please find the code below (not attaching the text document as it is large (443 KB): f =…
Harika
  • 21
  • 1
2
votes
1 answer

Pandas - Keyword count by Category

I am trying to get a count of the most occurring words in my df, grouped by another Columns values: I have a dataframe like so: df=pd.DataFrame({'Category':['Red','Red','Blue','Yellow','Blue'],'Text':['this is very good ','good','dont…
Python_Hey
  • 75
  • 6
2
votes
2 answers

Why did i get this "module 'nltk.translate.metrics' has no attribute 'precision'"?

import codes, i've tried importing precision and all the library i still can't get it work import nltk import string import pandas as pd import nltk.metrics import collections from nltk import precision from nltk.metrics.scores import (accuracy,…
Lorale
  • 113
  • 6
2
votes
0 answers

How do I use Wordnet for Dutch? Open Multilingual Wordnet will not work?

I'm fairly new to python, please bear with me. I am trying to create a code that identifies the difficult words in a Dutch sentence and lists them. For this, I need to know if the Dutch words in the input have any hypernyms. When I download Open…
2
votes
0 answers

NLTK - Best way to structure corpora?

I have 40k+ articles that have different segments in each. Each one exists as a Python dictionary with keys title, caegory,subcat,content, etc. How can I create a corpora out of these while still maintaining a separation between the different…
gEr
  • 215
  • 1
  • 2
  • 5
2
votes
0 answers

Best Smoothing Function to use in nltk corpus_bleu method

I'm trying to implement an Image Captioning model (CNN + LSTM) and as a validation metric I'm using the BLEU score. To be more precise, the corpus_bleu implementation of nltk. I tried using different SmoothingFunctions and I'm getting different…
Qwerty99
  • 29
  • 6
2
votes
1 answer

How to Capitalize Locations in a List Python

I am using NLTK lib in python to break down each word into tagged elements (i.e. ('London', ''NNP)). However, I cannot figure out how to take this list, and capitalise locations if they are lower case. This is important because london is no longer…
user17976264
  • 25
  • 1
  • 5
2
votes
1 answer

Manually install Open Multilingual Worldnet (NLTK)

I am working with a computer that can only access to a private network and it cannot send instrunctions from command line. So, whenever I have to install Python packages, I must do it manually (I can't even use Pypi). Luckily, the NLTK allows my to…
wetrust
  • 57
  • 7
2
votes
0 answers

how to import/download with the stopwords in the NLTK?

I downloaded nltk package via pip , but when I run my code , i get a messeage, "Resource stopwords not found, please use the NLTK downloader to obtain the resource ...." from nltk.corpus import stopwords stop_words =…
arve
  • 569
  • 2
  • 10
  • 27
2
votes
1 answer

Convert words between part of speech, when wordnet doesn't do it

There are a lot of Q&A about part-of-speech conversion, and they pretty much all point to WordNet derivationally_related_forms() (For example, Convert words between verb/noun/adjective forms) However, I'm finding that the WordNet data on this has…
Greg Nelson
  • 67
  • 1
  • 7