0

I have this data set and I want to display the columns ('Police District Name', 'Number of Crimes') for all crimes with more than 3 victims. However, the column 'Number of Crimes' does not exists and show be created, and it indicates (and the total number of crimes committed in that district). Note: every row indicates 1 crime.

Sample of the data set:

Incident ID Victims Police District Name Beat
0   201087096   1      GERMANTOWN        5N1
1   201087097   1        WHEATON         4K2
2   201087097   1        WHEATON         4K2
3   201087097   1        WHEATON         4K2
4   201087100   1      GERMANTOWN        5M1

Here is my code:

import pandas as pd

crimes_df = pd.read_csv('data/Crime.csv', low_memory=False, dtype={'Incident ID': int, 'Beat':object})
more_than_three_victims = crimes_df[(crimes_df['Victims'] > 3)]
more_than_three_victims.groupby(['Police District Name']).sum()

I have no idea what to do from here, I would appreciate any help.

Abdullah
  • 147
  • 2
  • 12

1 Answers1

1

So to initially read in your data, you don't have to create a df from all of your columns:

crimes_df = pd.read_csv('./Desktop/Crime.csv', usecols=['Police District Name', 'Victims'])
# The above will only read in the columns listed
more_than_three_victims = crimes_df[(crimes_df['Victims'] > 3)] # filter based on 3 crimes
groupby_victims = more_than_three_victims.groupby('Police District Name')['Victims'].agg(['sum']).rename(columns = {'sum': 'Number of Victims'})
print(groupby_victims)

output:

                      Number of Victims
Police District Name                  
BETHESDA                            52
GERMANTOWN                         106
MONTGOMERY VILLAGE                 104
ROCKVILLE                           73
SILVER SPRING                      107
TAKOMA PARK                          4
WHEATON                             78

This groups by Police District Name and sums the number of victims in each distict, and then renames the column 'sum' to Number of Crimes. I believe this is what you were looking for.

If you want to count the number of crimes with > 3 victims:

groupby_victims = more_than_three_victims.groupby('Police District Name')['Victims'].agg(['count']).rename(columns ={'count': 'Number of Crimes'})
# you just change 'sum' to 'count'

output:

                      Number of Crimes
Police District Name                  
BETHESDA                             9
GERMANTOWN                          23
MONTGOMERY VILLAGE                  21
ROCKVILLE                           15
SILVER SPRING                       21
TAKOMA PARK                          1
WHEATON                             18

Again, this will be the number of crimes, not the sum of victims.

d_kennetz
  • 5,219
  • 5
  • 21
  • 44