0

I'm writing a simple program which uses the mrjob library to map and reduce rows from a csv file.

One of the columns from a row is a yearID. This column is by default read in as a Str. I need to convert it to an Int so that I can compare it. For some reason, the Str to Int conversion is not working and has weird behavior.

I get the follow error when I run:

ValueError: invalid literal for int() with base 10: 'yearID'

This error is caused by the line 29 if int(stat.get("yearID")) > 1990: in the following code:

from mrjob.job import MRJob

class MRPitching(MRJob):

    def mapper(self, _, line):
        row = line.split(",")
    
        playerID = row[0]
    
        whip = {
            "p_H": row[13],
            "p_BB": row[16],
            "p_IPOUTS": row[12],
            "yearID": row[1]
        }
    
        yield playerID, whip
    
    def reducer(self, playerID, pitchingStats):
        pHSum = 0
        pBBSum = 0
        pIPOUTSSum = 0
    
        for stat in pitchingStats:
            if int(stat.get("yearID")) > 1990:
                yield playerID, stat

if __name__ == "__main__":
MRPitching.run()

For some reason the int() function is taking in yearID as the param when it should instead be the value of stat.get("yearID"). When I print stat.get("yearID"), I am seeing the expected value so I don't understand why int() is getting yearID.

1 Answers1

0

I forgot to consider that the CSV header was being included in the mapped data. Oops!

After adding a check to skip the header, the conversion works as expected.