0

I'm trying to practice joining data using mapreduce, but when i run this line

cat join1_File*.txt | ./join1_mapper.py | sort | ./join1_reducer.py

it displays this erorr :

Traceback (most recent call last): File "./join1_mapper.py", line 24, in value_in = key_value[1] #value is 2nd item IndexError: list index out of range Apr-04 able 13 n-01 5 Dec-15 able 100 n-01 5 Feb-02 about 3 11 Mar-03 about 8 11 Feb-22 actor 3 22 Feb-23 burger 5 15 Mar-08 burger 2 15


i expect the output to be like that :

Apr-04 able 13 n-01 5 Dec-15 able 100 n-01 5 Feb-02 about 3 11 Mar-03 about 8 11 Feb-22 actor 3 22 Feb-23 burger 5 15 Mar-08 burger 2 15


This is my join1_mapper.py code:

`for line in sys.stdin:
line       = line.strip()   #strip out carriage return
key_value  = line.split(",")   #split line, into key and value, returns a list
key_in     = key_value[0].split(" ")   #key is first item in list
value_in   = key_value[1]   #value is 2nd item 

#print key_in
if len(key_in)>=2:           #if this entry has <date word> in key
    date = key_in[0]      #now get date from key field
    word = key_in[1]
    value_out = date+" "+value_in     #concatenate date, blank, and value_in
    print( '%s\t%s' % (word, value_out) )  #print a string, tab, and string
else:   #key is only <word> so just pass it through
    print( '%s\t%s' % (key_in[0], value_in) )  #print a string tab and string

#Note that Hadoop expects a tab to separate key value #but this program assumes the input file has a ',' separating key value`

0 Answers0