Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions
2
votes
3 answers

tokenize sentence into words python

I want to extract information from different sentences so i'm using nltk to divide each sentence to words, I'm using this code: words=[] for i in range(len(sentences)): words.append(nltk.word_tokenize(sentences[i])) words it works pretty…
Hermoine
  • 63
  • 7
2
votes
0 answers

NLTK not getting imported in VS Code

I have just started learning NLP and for that purpose I installed nltk package using pip install nltk in the cmd terminal of VS Code. After I installed it, I tried importing it in the command line itself and I was successful but in the main window…
2
votes
2 answers

Find all words in a sentence related to a keyword

I have the following text and want to isolate a part of the sentence related to a keyword, in this case keywords = ['pizza', 'chips']. text = "The pizza is great but the chips aren't the best" Expected Output: {'pizza': 'The pizza is…
Ali
  • 328
  • 2
  • 7
2
votes
1 answer

Why the the total number in confusion matrix not same as the data input?

Why the total confusion matrix does not have the same number os samples as the dataset? The dataset contains 7514 but the total at confusion matrix not exceed 2000. Here is the code: import re import nltk nltk.download('stopwords') from nltk.corpus…
mino
  • 33
  • 5
2
votes
1 answer

Error with count tags (nltk) in column dataframe

# Extracting definition from different words in each sentence # Extractinf from ecah row the, NOUN, VERBS, NOUN Plural text = data['Omschrijving_Skill_without_stopwords'].tolist() tagged_texts = pos_tag_sents(map(word_tokenize, text)) data['pos'] =…
2
votes
2 answers

How to check if a given english sentence contains all non-meaning words using python?

I want to check in a Python program if a given english sentence contains all non-meaning words. Return true if sentence has all words that have no meaning e.g. sdfsdf sdf ssdf fsdf dsd sd Return false if sentence contains at least one word that has…
Rohit
  • 6,941
  • 17
  • 58
  • 102
2
votes
1 answer

How to do chapter analysis from books imported from nltk.corpus.gutenberg.fileids()

I am a newbie using python. Now I am doing natural language processing for a novel, and I choose to load the book from nltk.corpus.gutenberg.fileids(). I just use 'Sense and Sensibility'. Then I want to analyze each chapter. How to split the whole…
Freda Yu
  • 21
  • 1
2
votes
1 answer

Getting synsets of custom hungarian wordnet dictionary with nltk

I am very new to NLP and I might be doing something wrong. I would like to work with a hungarian text where I can get the synset/hyponym/hypernym of some selected words. I am working in python. As Open Multilingual Wordnet does not have hungarian…
hunsnowboarder
  • 170
  • 2
  • 18
2
votes
0 answers

Final Semester project about semantic analysis/information retrieval

I'm moving to my final year at college Engineering Computer Science department and i wanted to have my graduation project in a topic related to information retrieval & semantic analysis. I've had my internship in those topics and i'm very…
Hady Elsahar
  • 2,121
  • 4
  • 29
  • 47
2
votes
1 answer

Q : Python Spell Checker using NLTK

So i have this line of code using NLTK library def autospell(text): spells = [spell(w) for w in (nltk.word_tokenize(text))] return " ".join(spells) train_data['Phrase'][:200].apply(autospell) And i got this error telling me that…
Arkan
  • 55
  • 1
  • 5
2
votes
2 answers

Deleting and updating a string and entity index in a text document for NER training data

I am trying to create a training dataset for NER recognition. For that, I have huge amounts of data that need to be tagged and remove the unnecessary sentences. On removing the unnecessary sentence the index potion must be updated. Last day I saw…
imhans33
  • 133
  • 11
2
votes
2 answers

Jython: ImportError: No module named multiarray

When I try to call file and its method using Jython it shows the following error, while my Numpy, Python and NLTK is correctly installed and it works properly if I directly run directly from the Python shell File…
ninja123
  • 21
  • 1
  • 2
2
votes
2 answers

How to tokenize a string in consecutive pairs using python?

My Input is "I like to play basketball". And the Output I am looking for is "I like", "like to", "to play", "play basketball". I have used Nltk word tokenize but that gives single tokens only. I have these type of statements in a huge database and…
Saurabh
  • 23
  • 3
2
votes
1 answer

nltk_data installation gives RuntimeWarning

python3 -m nltk.downloader -d /usr/local/share/nltk_data all Upon running the above command in GCP, I face the following RuntimeWarning 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of…
Tony Stark
  • 511
  • 2
  • 15
2
votes
1 answer

NLTK doesn't lemmatize uppercase words

I'm trying to change plural words to singular in a string with a mix of upper case and lowercase words. e.g. CARDBOARD BOXES, DIMENSIONS: 19cm H x 10cm W x 30cm D I used NLTK package to do so but it only accept lowercase strings and I don't want to…
Smiths
  • 23
  • 2
1 2 3
99
100