This should be pretty simple and I have put a few hours into this.
Example Data (name, binary, count):
Adam 0 1
Adam 1 1
Adam 0 1
Mike 1 1
Mike 0 1
Mike 1 1
Desired Example Output (name, binary, count):
Adam 0 2
Adam 1 1
Mike 0 1
Mike 1 2
Each name needs to have its own binary key of 0 or 1. Based on the binary Key, sum the count column. Notice the "reduce" in the desired output.
I have provided some of my code and I am trying to do without lists or dictionary in the reducer.
""" Reducer takes names with their binaries and partial counts adds them up
Input: name \t binary \t pCount
Output:
name \t binary \t tCount
"""
import re
import sys
current_name = None
zero_count, one_count = 0,0
for line in sys.stdin:
# parse the input
name, binary, count = line.split('\t')
if name == current_name:
if int(binary) == 0:
zero_count += int(count)
elif int(binary) == 1:
one_count += int(count)
else:
if current_name:
print(f'{current_name}\t{0} \t{zero_count}')
print(f'{current_name}\t{1} \t{one_count}')
current_name, binary, count = word, int(binary), int(count)
print(f'{current_name}\t{1} \t{count}')
For some reason, it is not printing properly. (first name that passes through is funky)I am also not sure of the best way to pass through all the printing for one_count and zero_count that also displays its binary labels.
Any help would be appreciated. Thanks!