pandas: How to list total number of users for each group and all users in group?

Question

I have the following code in jupyter notebook:

import h5py
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_hdf('accounting-2018-10-deid.h5', 'table')
df.columns
Out[4]:
Index(['group', 'owner', 'job_number', 'submission_time', 'start_time',
   'end_time', 'failed', 'exit_status', 'granted_pe', 'slots',
   'task_number', 'maxvmem', 'h_data', 'h_rt', 'highp', 'exclusive',
   'h_vmem', 'gpu', 'pe', 'slot', 'wait_time', 'wtime', 'campus'],
  dtype='object')

The meanings of the columns:

owner: the owner of a job
group: the group a owner belongs to; a group can have one or more owners

The task is: For each group, list the number of users, and list all of these users (i.e. the users having the same “group” field). For example: group 1 (4 users): user2, user32, user41, user56?

I tried to use groupby() but didn't get right answer. Please, help me.

Please read [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) — Sheldore, Jan 18 '19 at 01:15

score 0 · Answer 1 · answered Jan 18 '19 at 03:39

0

Does this work for you?

import pandas as pd

df = pd.DataFrame({"owner": ["Allen", "Bob", "Cindy", "David", "Emily", "Frank"],
                   "group": ["A", "C", "B", "C", "B", "B"]})

groups = df.groupby("group")
for group in groups:
    print('There are {} owners in group {}'.format(group[1].shape[0], group[0]))
    print('They are {}.'.format(group[1].owner.to_string(index=False).replace('\n', ', ')))
    print()

answered Jan 18 '19 at 03:39

keineahnung2345

2,635
4
13
28

Hi keineahnung2345, I did as you adviced, but get MemoryError MemoryError Traceback (most recent call last) in 1 groups = df.groupby("group") ----> 2 for group in groups: 3 # print('There are {} owners in group {}'.format(group[1].shape[0], group[0])) 4 print(group[1]) 5 # print('They are {}.'.format(group[1].owner.to_string(index=False).replace('\n', ', '))) ..... MemoryError: – Tal Nur Jan 18 '19 at 16:29
@TalNur I guess it's because your dataset is too large. Could you try this method with smaller dataset and see if it works? You can also try adding `low_memory=False` or `usecols=['group', 'owner']` in `pd.read_hdf()` just like https://stackoverflow.com/questions/17557074/memory-error-when-using-pandas-read-csv/47230263#47230263 and https://stackoverflow.com/questions/26063231/read-specific-columns-with-pandas-or-other-python-module suggests, and see if they works. – keineahnung2345 Jan 19 '19 at 00:16

pandas: How to list total number of users for each group and all users in group?

1 Answers1