Stemming csv files in Python

Question

Okay, I have this code in Python in which it imports two csv files. The first csv file is named "claims" (one column, many rows) and the other one is named "sexualHarassment" (one column, many rows) The program right now checks all rows of "claims" to see if it contains any words from "sexualHarassment" and if it does, then it outputs that row into a new csv file named "output" It also eliminates certain stopwords that I chose. <-- This part of the program works.

Now I need to go through a stemming process to stem all of the words to take out tenses from the words. Such as "discriminated" to "discriminat", "harassed" to "harass" and so on..

I've downloaded and installed a couple stemming packages but I can only stem out words such as:

    from nltk import PorterStemmer
    PorterStemmer().stem_word('discriminated')
    >>>discriminate

Is there anyway that I can run this stemming check for all words in each row of the "sexual harassment" file before it outputs it into the new csv file?

Here is a copy of my code:

    import csv
    with open("claims.csv") as file1, open("masterlist.csv") as file2,
    open("stopwords.csv") as file3, open("output.csv", "wb+") as file4:
        writer = csv.writer(file4)
        key_words = [word.strip() for word in file2.readlines()]
        stop_words = [' also ', ' although ', ' always ', ' and ', ' any ', ' are ', ' as ', ' at ',\
              ' around ', ' be ', ' by ', ' for ', ' from ', ' has ', ' on ', ' that ', ' were ', ' will ',\
              ' with ' ' can ', ' cannot ', ' if ', ' it ', ' the ', ' there ', ' which ', ' in ', ' is ',\
              ' its ', ' me ', ' of ', ' was ', ' then ', ' with ', ' a ', ' an ', ' to ', ' to ', ' when ',\
              ' however ', '"', ',', '.', '-', '?', '!', '(', ')']
        for row in file1:
            row = row.strip()
            row = row.lower()
            for stopword in stop_words:
                if stopword in row:
                    row = row.replace(stopword," ")
            for key in key_words:
                if key in row:
                    writer.writerow([key, row])
                    break

iterate over each word in the row and call the stem method on it — Padraic Cunningham, Jul 11 '14 at 19:38

Stemming csv files in Python

0 Answers0