-1

I found this python code to perform stemming on text files.

import nltk
import string
from collections import Counter


def get_tokens():
    with open('/Users/MYUSERNAME/Desktop/Test_sp500/A_09.txt', 'r') as shakes:
        text = shakes.read()
        lowers = text.lower()
        no_punctuation = lowers.translate(None,string.punctuation)
        tokens = nltk.word_tokenize(no_punctuation)
        return tokens


tokens = get_tokens()
count = Counter(tokens)
print
count.most_common(10)

from nltk.corpus import stopwords

tokens = get_tokens()
filtered = [w for w in tokens if not w in stopwords.words('english')]
count = Counter(filtered)
print
count.most_common(100)

from nltk.stem.porter import *


def stem_tokens(tokens, stemmer):
    stemmed = []
    for item in tokens:
        stemmed.append(stemmer.stem(item))
    return stemmed


stemmer = PorterStemmer()
stemmed = stem_tokens(filtered, stemmer)
count = Counter(stemmed)
print
count.most_common(100)

When I try to run this program I get the following error:

Traceback (most recent call last):
  File "/Users/MYUSERNAME/Desktop/stemmer.py", line 15, in <module>
    tokens = get_tokens()
  File "/Users/MYUSERNAME/Desktop/stemmer.py", line 10, in get_tokens
    no_punctuation = lowers.translate(None,string.punctuation)
TypeError: translate() takes exactly one argument (2 given)

Now my questions are:

  1. How can I fix this?
  2. When this program works, how could I run this script not only for one .txt file but for all the .txt files in a certain directory?

Note: I usually don't have to program so I only know the absolute Python Basics.

Johan
  • 3,577
  • 1
  • 14
  • 28
paschy96
  • 1
  • 2

1 Answers1

0

I would assume you're using a Python version >= 3.

In Python 2.7 the function translate take 2 arguments but in python 3 and newer it takes only 1 argument. That's essentially why you're getting an error.

I am not sure what you're trying to do with the None argument, because in Python 2.7 it wouldn't make any sense anyway, you are basically trying to translate the string.punctuation in None.

Instead you would need to make a translation table and then pass it to the translate function.

translator = str.maketrans('', '', string.punctuation)
no_punctuation = lowers.translate(translator)
Johan
  • 3,577
  • 1
  • 14
  • 28