Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions

votes

12 answers

Spell Checker for Python

I'm fairly new to Python and NLTK. I am busy with an application that can perform spell checks (replaces an incorrectly spelled word with the correct one). I'm currently using the Enchant library on Python 2.7, PyEnchant and the NLTK library. The…

asked Dec 18 '12 at 07:18

Mike Barnes

4,217
18
40
64

votes

4 answers

str.translate gives TypeError - Translate takes one argument (2 given), worked in Python 2

I have the following code import nltk, os, json, csv, string, cPickle from scipy.stats import scoreatpercentile lmtzr = nltk.stem.wordnet.WordNetLemmatizer() def sanitize(wordList): answer = [word.translate(None, string.punctuation) for word in…

python nltk typeerror

asked Apr 19 '14 at 21:32

carebear

votes

4 answers

Programmatically install NLTK corpora / models, i.e. without the GUI downloader?

My project uses the NLTK. How can I list the project's corpus & model requirements so they can be automatically installed? I don't want to click through the nltk.download() GUI, installing packages one by one. Also, any way to freeze that same list…

installation package nltk requirements corpus

asked Apr 30 '11 at 18:34

Bluu

5,226
4
29
34

votes

4 answers

Counting the Frequency of words in a pandas data frame

I have a table like below: URN Firm_Name 0 104472 R.X. Yah & Co 1 104873 Big Building Society 2 109986 St James's Society 3 114058 The Kensington Society Ltd 4 113438 MMV Oil…

python pandas nltk

asked Oct 17 '17 at 08:54

J Reza

votes

7 answers

Improving the extraction of human names with nltk

I am trying to extract human names from text. Does anyone have a method that they would recommend? This is what I tried (code is below): I am using nltk to find everything marked as a person and then generating a list of all the NNP parts of that…

python nlp nltk

asked Nov 29 '13 at 17:33

e h

8,435
7
40
58

votes

14 answers

NLTK Lookup Error

While running a Python script using NLTK I got this: Traceback (most recent call last): File "cpicklesave.py", line 56, in pos = nltk.pos_tag(words) File "/usr/lib/python2.7/site-packages/nltk/tag/__init__.py", line 110, in pos_tag …

python python-2.7 nltk

asked Mar 08 '16 at 07:29

Shiv Shankar

1,007
2
8
13

votes

2 answers

BeatifulSoup4 get_text still has javascript

I'm trying to remove all the html/javascript using bs4, however, it doesn't get rid of javascript. I still see it there with the text. How can I get around this? I tried using nltk which works fine however, clean_html and clean_url will be removed…

python beautifulsoup nltk

asked Apr 02 '14 at 01:39

KVISH

12,923
17
86
162

votes

5 answers

tag generation from a text content

I am curious if there is an algorithm/method exists to generate keywords/tags from a given text, by using some weight calculations, occurrence ratio or other tools. Additionally, I will be grateful if you point any Python based solution / library…

python tags machine-learning nlp nltk

asked Apr 18 '10 at 09:39

Hellnar

62,315
79
204
279

votes

3 answers

Tokenize a paragraph into sentence and then into words in NLTK

I am trying to input an entire paragraph into my word processor to be split into sentences first and then into words. I tried the following code but it does not work, #text is the paragraph input sent_text = sent_tokenize(text) …

python nltk

asked Jun 03 '16 at 04:03

Nikhil Raghavendra

1,570
5
18
25

votes

3 answers

Scikit Learn TfidfVectorizer : How to get top n terms with highest tf-idf score

I am working on keyword extraction problem. Consider the very general case from sklearn.feature_extraction.text import TfidfVectorizer tfidf = TfidfVectorizer(tokenizer=tokenize, stop_words='english') t = """Two Travellers, walking in the noonday…

python scikit-learn nlp nltk tf-idf

asked Dec 11 '15 at 20:39

AbtPst

7,778
17
91
172

votes

6 answers

NLTK Named Entity Recognition with Custom Data

I'm trying to extract named entities from my text using NLTK. I find that NLTK NER is not very accurate for my purpose and I want to add some more tags of my own as well. I've been trying to find a way to train my own NER, but I don't seem to be…

python nlp nltk named-entity-recognition

asked Jul 04 '12 at 18:24

user1502248

votes

3 answers

Save Naive Bayes Trained Classifier in NLTK

I'm slightly confused in regard to how I save a trained classifier. As in, re-training a classifier each time I want to use it is obviously really bad and slow, how do I save it and the load it again when I need it? Code is below, thanks in advance…

python machine-learning classification nltk naivebayes

asked Apr 04 '12 at 18:24

user179169

votes

5 answers

How to create a word cloud from a corpus in Python?

From Creating a subset of words from a corpus in R, the answerer can easily convert a term-document matrix into a word cloud easily. Is there a similar function from python libraries that takes either a raw word textfile or NLTK corpus or Gensim…

python nltk corpus gensim word-cloud

asked May 20 '13 at 08:51

alvas

115,346
109
446
738

votes

4 answers

Using NLTK and WordNet; how do I convert simple tense verb into its present, past or past participle form?

Using NLTK and WordNet, how do I convert simple tense verb into its present, past or past participle form? For example: I want to write a function which would give me verb in expected form as follows. v = 'go' present = present_tense(v) print…

python nlp nltk wordnet

asked Sep 20 '10 at 15:36

Software Enthusiastic

25,147
16
58
68

votes

5 answers

Docker NLTK Download

I am building a docker container using the following Dockerfile: FROM ubuntu:14.04 RUN apt-get update RUN apt-get install -y python python-dev python-pip ADD . /app RUN apt-get install -y python-scipy RUN pip install -r…

python docker nltk

asked Jun 30 '15 at 15:56

GNMO11

2,099
4
19
28

Prev 1 2

…

99 100 Next