How to filter data by one column and group by another column

Question

I have problem with compressed counting values in one dictionary based of value of another one.

I made up such a code below which represent idea of:

1) Extracting the data to list

2) Taking uniqe values for next proccesing

3) Loop for counting the number of males and females only for "accident"

Problem:

What is the effective solution to counting the values for each category in uniqe set. I mean what if I had 1000 uniqe categories, I do not want to write 1000 "if's"

It's my first question in stackoverflow, that's why i'm sorry for any mistake I've done :)

Original data (first 5 rows):
[
['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], 
['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], 
['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], 
['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], 
['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']
]



# Accidents list
accidents_list = [row[3] for row in data] # list of all accidents

print(set(accidents_list)) # unique set

{'Homicide', 'NA', 'Undetermined', 'Accidental', 'Suicide'}

gender_list = [row[5] for row in data]
print(gender_list)

['M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'F', 'M', 'M' ....]

# Accidents dict and loop over it
accidents_gender = {}

for i, v in enumerate(gender_list):
    if v not in accidents_gender:
        accidents_gender[v] = 0
    if accidents_list[i] == 'Accidental':
        accidents_gender[v] += 1

print(accidents_gender) # printing only values for accidental

{'M': 1421, 'F': 218}

is the accidents_list as long as gender_list because it doesnt makes sense right..accidents_list[i] will give index error — iamklaus, Dec 28 '18 at 13:03

Carlos Mermingas · Accepted Answer · 2018-12-29T04:29:01.280

0

You can use the Counter (documented here).

I'd use Pandas (example below) but if that's an overkill, here's a way to work it out with Counter:

from collections import Counter

# Exclude header
data = data[1:]

# Filter accidents
accidents = filter(lambda x: x[3] == 'Accidental', data)

# Count by gender
by_gender = Counter(item[5] for item in accidents)
print(by_gender)

Here's a way to do it with Pandas:

import pandas as pd

df = pd.DataFrame.from_records(data=data[1:], columns=data[0])

# Filter 'Accidental', group by sex, get the size of each group
df = df[df['intent'] == 'Accidental'].groupby('sex').size()

# Print it out
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df)

You'd be better off using a Jupyter Notebook for this. The Pandas documentation is superb but also a lot to digest. This SO answer has good, relevant info.

I hope this helps.

edited Dec 29 '18 at 04:29

answered Dec 28 '18 at 12:31

Carlos Mermingas

3,822
2
21
40

Thank you carlos for such a quick response! However your solution is only counting the number of uniqe values in the list. I want to find the values for "Accidents" for Males and Females only. I have updated the code. The list of accidents are correlated to gender list. I mean that each row in my data consist of information what type of accident was taken and which gender commit it. I hope I have clear the idea :) – MKwiatosz Dec 28 '18 at 13:07
I see. Are you open to using Pandas? It would be easier than counting things yourself. – Carlos Mermingas Dec 28 '18 at 13:45
Sure! I'm still learning Pandas, but I will be glad if you teach me something – MKwiatosz Dec 28 '18 at 13:48
Cool. I won’t be able to look at this again until later today or over the weekend. Can you update your question with a very small example of what your original data looks like? It seems to be a list of lists or something like that. – Carlos Mermingas Dec 28 '18 at 13:54
Yep, I'll do it. – MKwiatosz Dec 28 '18 at 14:10
@MKwiatosz - I updated the answer. I hope it helps. May I suggest rewording your question to describe the actual problem that you're trying to solve (perhaps something like "How to filter data by one column and group by another column"), unless using dictionaries is important for your implementation. – Carlos Mermingas Dec 29 '18 at 04:33
This is what I was looking for, thank you Carlos :) I'm new to the active community, could I somehow reward you? – MKwiatosz Dec 29 '18 at 12:37

How to filter data by one column and group by another column

Problem:

1 Answers1