Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions
2
votes
1 answer

What does nltk.download("wordnet") accomplish

I wanted to know what nltk.download() do. Also, if I add "wordnet" as an argument, then what happens. Is wordnet like some dataset or something, I would like more clarification on that.
SarthakJain
  • 1,226
  • 6
  • 11
2
votes
1 answer

How to grab streaming data from twitter connect with pycurl using nltk - regular expression

I am newbie in Python and given a task from my boss to do this : Grab streaming data from twitter connect with pycurl and output in JSON Parsing using NLTK and Regular Expression Save it to database file(mySQL) or file base(txt) Note : this is the…
sdwinanta
  • 41
  • 1
  • 5
2
votes
2 answers

Does anyone have a Categorized XML Corpus Reader for NLTK?

Has anyone written a Categorized XML Corpus reader for NLTK? I'm working with the Annotated NYTimes corpus. It's an XML corpus. I can read the files with XMLCorpusReader but I'd like to use some of NLTK's category functionality. There's a nice…
NAD
  • 615
  • 1
  • 7
  • 20
2
votes
2 answers

Finding Synonyms for Compound words like Call-Taxi , Artificial Intelligence using NLTK or Spacy?

Synonyms in Python can be easily found using NLTK or Spacy for a single word like Cat, Dog, Happy, or sad But when it comes to compound words like Artificial- Intelligence or Call-Taxi the language processor always gives output for each and every…
2
votes
1 answer

Extracting text from a passage using spacy or nltk

Sorry if this is a repeat but I couldn't find an answer or at least would like to know if there is a clean way to do this. I have a passage from which I need to extract certain entities. Any alphanumeric string like: PQ1234, Z123 etc Any…
Makarand
  • 126
  • 1
  • 1
  • 8
2
votes
2 answers

Converting untagged corpora to tagged (NLTK)

I have a plaintext corpora, that I want to tag and save, so I can use it further. What's the best way to do this? I already have my tagger made, but I can't figure out a way to change the corpora that isn't messy
Bendar
  • 21
  • 1
2
votes
1 answer

Dictionary of dictionaries into dataframe

I have a function which counts co-occurrences between center and context words within reviews. def get_coocs(x): occurdict={} # Pre-processing tokens = nltk.word_tokenize(x) tokenslower = list(map(str.lower, tokens)) …
RDTJr
  • 185
  • 1
  • 9
2
votes
1 answer

download nltk corpus as cmdclass in setup.py files not working

There are some parts of the nltk corpus that I'd like to add to the setup.py file. I followed the response here by setting up a custom cmdclass. My setup file looks like this. from setuptools import setup from setuptools.command.install import…
mizzlosis
  • 515
  • 1
  • 4
  • 17
2
votes
1 answer

NLTK Remove invalid words

In the python NLTK library, you can tokenise a sentence into individual words and punctuation. It will tokenise words that are not english and grammatically correct. How can i remove these tokens so all i have left is the grammatically correct,…
Magmurrr
  • 238
  • 1
  • 9
2
votes
1 answer

Counting co-occurrences between nouns and verbs/adjectives

I have a dataframe which contains reviews, as well as two lists, one which stores nouns and the other storing verbs/adjectives. Example code: import pandas as pd data = {'reviews':['Very professional operation. Room is very clean and comfortable', …
RDTJr
  • 185
  • 1
  • 9
2
votes
2 answers

How to check if a word is in a string without using multiple loops

So the purpose of this program is to find example sentences for each word in ner.txt. For example, if the word apple is in ner.txt then I would like to find if there is any sentence that contains the word apple and output something like apple: you…
DSMK Swab
  • 163
  • 1
  • 7
2
votes
2 answers

How can I still use the 'wv' attribute of this (probably deprecated?) package/module in Python?

I am new to the Gensim package, and I am trying to get a little familiar with it. I am now trying to import an existing, trained model. I am following exactly the example from this video (this section starts at 5:30). When I run the code from that…
lakeviking
  • 322
  • 1
  • 6
  • 18
2
votes
1 answer

Problem to extract NER subject + verb with spacy and Matcher

I work on an NLP project and i have to use spacy and spacy Matcher to extract all named entities who are nsubj (subjects) and the verb to which it relates : the governor verb of my NE nsubj. Example : Georges and his friends live in Mexico…
Etienne Armangau
  • 255
  • 2
  • 10
2
votes
2 answers

Pre-process text string with NLTK

I have a dataframe A containing docid(document ID), title(title of the article), lineid(line ID, aka the location of the paragraph), text, and tokencount(counts of words including white spaces): docid title lineid …
user15155674
2
votes
0 answers

Get a vocabulary (unigrams) of a column in a DataFrame. Pandas

Hello i have this Dataset id cuisine ingredients processed ingredients 0 10259 greek [romaine lettuce, black olives, grape tomatoes... romaine lettuce black olives grape tom... 1 …
Lefteris Kyprianou
  • 219
  • 1
  • 3
  • 14
1 2 3
99
100