Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions

votes

1 answer

nltk word_tokenize in Pandas DataFrame only returns tokens for the first 101 words/tokens

I'm trying to apply word_tokenization to a Pandas DataFrame column as the step before POS tagging. The source/raw column is 'sent' (already sentence-tokenized) and the destination column is 'word'. Here's the code, including the max column width…

python pandas nltk

asked Jun 27 '22 at 13:26

ddormer

votes

0 answers

NLTK inter-annotator agreement using Krippendorff Alpha Outputs Zero on only 1 disagreement

I have a sequence-labeling task in NLP, where the annotators are asked to provide 1 or more labels to each word in a sentence. For e.g., for a sentence [a, b, c, d] Ann 1. provided [[0, 1, 2], [0, 1], [0], [0]] Ann 2. provided [[0, 2], [0], [0], [0,…

nlp statistics annotations nltk cohen-kappa

asked Apr 30 '22 at 19:40

Saurabh Srivastava

votes

2 answers

How to solve NLTK LookupError(resource_not_found) ? It exists in path. (Python)

I try to use nltk library, but i got stuck. I downloaded the stopwords library by hand (I cant download by code because of the permission issues on my working machine), but it always gives me the following error; LookupError:…

python nltk

asked Apr 25 '22 at 16:14

Sevval Kahraman

1,185
3
10
37

votes

1 answer

Ngram in python with start_pad

i'm know in python i'm take some basic thing about list and tuple but my not full understand the my cod i want create list have three index in each index have tuple with tow index like this [('~','a'),('a','b'),('b','c')] the first index in tuple…

python numpy nlp nltk

asked Apr 13 '22 at 22:41

jawaher

votes

1 answer

Calculating PMI for bigram and discrepancy

Suppose I have the following text: text = "this is a foo bar bar black sheep foo bar bar black sheep foo bar bar black sheep shep bar bar black sentence" I can calculate the PMI for bigram using NLTK as follow: bigram_measures =…

python nltk information-theory

asked Apr 12 '22 at 13:39

Wiliam

1,078
10
21

votes

0 answers

NLTK word_tokenize returning empty

I am trying to tokenize the words and sentences in a text document but it is returing empty for both. Could you please check and share why am I seeing this? Please find the code below (not attaching the text document as it is large (443 KB): f =…

python nltk tokenize

asked Apr 06 '22 at 06:53

Harika

votes

1 answer

Pandas - Keyword count by Category

I am trying to get a count of the most occurring words in my df, grouped by another Columns values: I have a dataframe like so: df=pd.DataFrame({'Category':['Red','Red','Blue','Yellow','Blue'],'Text':['this is very good ','good','dont…

python pandas nltk sentiment-analysis

asked Apr 04 '22 at 12:33

Python_Hey

votes

2 answers

Why did i get this "module 'nltk.translate.metrics' has no attribute 'precision'"?

import codes, i've tried importing precision and all the library i still can't get it work import nltk import string import pandas as pd import nltk.metrics import collections from nltk import precision from nltk.metrics.scores import (accuracy,…

python jupyter-notebook nltk metrics naivebayes

asked Mar 14 '22 at 15:54

Lorale

votes

0 answers

How do I use Wordnet for Dutch? Open Multilingual Wordnet will not work?

I'm fairly new to python, please bear with me. I am trying to create a code that identifies the difficult words in a Dutch sentence and lists them. For this, I need to know if the Dutch words in the input have any hypernyms. When I download Open…

python nltk wordnet open-multilingual-wordnet

asked Feb 25 '22 at 14:54

charlotte_vdv

votes

0 answers

NLTK - Best way to structure corpora?

I have 40k+ articles that have different segments in each. Each one exists as a Python dictionary with keys title, caegory,subcat,content, etc. How can I create a corpora out of these while still maintaining a separation between the different…

python nltk corpus

asked Aug 18 '11 at 04:11

gEr

votes

0 answers

Best Smoothing Function to use in nltk corpus_bleu method

I'm trying to implement an Image Captioning model (CNN + LSTM) and as a validation metric I'm using the BLEU score. To be more precise, the corpus_bleu implementation of nltk. I tried using different SmoothingFunctions and I'm getting different…

python pytorch nltk bleu

asked Feb 01 '22 at 21:38

Qwerty99

votes

1 answer

How to Capitalize Locations in a List Python

I am using NLTK lib in python to break down each word into tagged elements (i.e. ('London', ''NNP)). However, I cannot figure out how to take this list, and capitalise locations if they are lower case. This is important because london is no longer…

list nltk

asked Jan 19 '22 at 17:06

user17976264

votes

1 answer

Manually install Open Multilingual Worldnet (NLTK)

I am working with a computer that can only access to a private network and it cannot send instrunctions from command line. So, whenever I have to install Python packages, I must do it manually (I can't even use Pypi). Luckily, the NLTK allows my to…

python nltk wordnet

asked Jan 18 '22 at 10:29

wetrust

votes

0 answers

how to import/download with the stopwords in the NLTK?

I downloaded nltk package via pip , but when I run my code , i get a messeage, "Resource stopwords not found, please use the NLTK downloader to obtain the resource ...." from nltk.corpus import stopwords stop_words =…

python-3.x nltk

asked Jan 14 '22 at 19:27

arve

votes

1 answer

Convert words between part of speech, when wordnet doesn't do it

There are a lot of Q&A about part-of-speech conversion, and they pretty much all point to WordNet derivationally_related_forms() (For example, Convert words between verb/noun/adjective forms) However, I'm finding that the WordNet data on this has…

nltk wordnet part-of-speech

asked Jan 14 '22 at 16:41

Greg Nelson

Prev 1 2 3

…

99 100 Next