-1

This should be pretty simple and I have put a few hours into this.

Example Data (name, binary, count):

Adam 0 1
Adam 1 1
Adam 0 1
Mike 1 1
Mike 0 1
Mike 1 1  

Desired Example Output (name, binary, count):

Adam 0 2
Adam 1 1
Mike 0 1
Mike 1 2  

Each name needs to have its own binary key of 0 or 1. Based on the binary Key, sum the count column. Notice the "reduce" in the desired output.

I have provided some of my code and I am trying to do without lists or dictionary in the reducer.

""" Reducer takes names with their binaries and partial counts adds them up

Input: name \t binary \t pCount

Output: name \t binary \t tCount
"""

import re
import sys

current_name = None
zero_count, one_count = 0,0

for line in sys.stdin:
    # parse the input
    name, binary, count = line.split('\t')

   if name == current_name:
      if int(binary) == 0:
        zero_count += int(count)

    elif int(binary) == 1:
        one_count += int(count)
else:
    if current_name:
        print(f'{current_name}\t{0} \t{zero_count}')
        print(f'{current_name}\t{1} \t{one_count}')
    current_name, binary, count = word, int(binary), int(count)

print(f'{current_name}\t{1} \t{count}')

For some reason, it is not printing properly. (first name that passes through is funky)I am also not sure of the best way to pass through all the printing for one_count and zero_count that also displays its binary labels.

Any help would be appreciated. Thanks!

CP3
  • 1
  • 1

2 Answers2

1

I think it is best to use pandas library.

import pandas as pd
from io import StringIO
a ="""Adam 0 1
Adam 1 1
Adam 0 1
Mike 1 1
Mike 0 1
Mike 1 1"""

text = StringIO(a)
name, binary, count = [],[],[]

for line in text.readlines():
    a = line.strip().split(" ")
    name.append(a[0])
    binary.append(a[1])
    count.append(a[2])

df = pd.DataFrame({'name': name, "binary": binary, "count": count})
df['count'] = df['count'].astype(int)
df = df.groupby(['name', 'binary'])['count'].sum().reset_index()
print(df)
name    binary  count
0   Adam    0   2
1   Adam    1   1
2   Mike    0   1
3   Mike    1   2

if your data already in a csv or text file. It can be read using pandas.

df = pd.read_csv('path to your file')
Khalil Al Hooti
  • 4,207
  • 5
  • 23
  • 40
1

The indentation was bad and the conditions weren't handled properly.

import re
import sys

current_name = None
zero_count, one_count = 0,0
i = 0
for line in sys.stdin:
    # parse the input
    name, binary, count = line.split('\t')
    #print(name)
    #print(current_name)
    if(i == 0):
        current_name = name
        i  = i + 1
    if(name == current_name):
        if int(binary) == 0:
            zero_count += int(count)

        elif int(binary) == 1:
            one_count += int(count)
    else:
        print(f'{current_name}\t{0} \t{zero_count}')
        print(f'{current_name}\t{1} \t{one_count}')
        current_name = name
        #print(current_name)
        zero_count, one_count = 0,0
        if int(binary) == 0:
            zero_count += int(count)
        elif int(binary) == 1:
            one_count += int(count)
print(f'{current_name}\t{0} \t{zero_count}')
print(f'{current_name}\t{1} \t{one_count}')

'i' handles the case where you do not have a 'current_name' for the first line of input (It will run only once).
In the else block, you had re-initialize 'zero_count' and 'one_count', and also do the calculation for the new 'current_name'.

Output for my code :

Adam    0       2
Adam    1       1
Mike    0       1
Mike    1       2
hrishikeshs
  • 68
  • 2
  • 5