1

I am trying to convert a dataframe where each row is a specific event, and each column has information about the event. I want to turn this into data in which each row is a country and year with information about the number and characteristics about the events in the given year.In this data set, each event is an occurrence of terrorism, and I want to count the number of events where the "target" is a government building. One of the columns is called "targettype" or "targettype_txt" and there are 5 different entries in this column I want to count (government building, military, police, diplomatic building etc). The targettype is also coded as a number if that is easier (i.e. there is another column where gov't building is 2, military installation is 4 etc..)

FYI This data set has 16 countries in West Africa and is looking at years 2000-2020 with a total of roughly 8000 events recorded. The data comes from the Global Terrorism Database, and this is for a thesis/independent research project (i.e. not a graded class assignment).

Right now my data looks like this (there are a ton of other columns but they aren't important for this):

eventID iyear country_txt nkill nwounded nhostages targettype_txt
10000102 2000 Nigeria 3 10 0 government building
10000103 2000 Mali 1 3 15 military installation
10000103 2000 Nigeria 15 0 0 government building
10000103 2001 Benin 1 0 0 police
10000103 2001 Nigeria 1 3 15 private business

. . .

And I would like it to look like this:

country_txt iyear total_nkill total_nwounded total_nhostages total public_target
Nigeria 2000 200 300 300 15
Nigeria 2001 250 450 15 17

I was able to get the total number for nkill,nwounded, and nhostages using this super simple line:

df2 = cdf.groupby(['country','country_txt', 'iyear'])['nkill', 'nwound','nhostkid'].sum()

But this is a little different because I want to only count certain entries and sum up the total number of times they occur. Any thoughts or suggestions are really appreciated!

taraamcl
  • 25
  • 5

1 Answers1

0

Try:

cdf['CountCondition'] = (cdf['targettype_txt']=='government building') | 
    (cdf['targettype_txt']=='military installation') | 
    (cdf['targettype_txt']=='police')
df2 = cdf[cdf['CountCondition']].groupby(['country','country_txt', 'iyear', 'CountCondition']).count()

You create a new column 'CountCondition' which just marks as true or false if the condition in the statement holds. Then you just count the number of times the CountCondition is True. Hope this makes sense.

It is possible to combine all this into one statement and NOT create an additional column but the statement gets quite convaluted and more difficult to understand how it works:

df2 = cdf[(cdf['targettype_txt']=='government building') | 
    (cdf['targettype_txt']=='military installation') | 
    (cdf['targettype_txt']=='police')].groupby(['country','country_txt', 'iyear']).count()
Galo do Leste
  • 703
  • 5
  • 13
  • This isn't quite right, but it is closer to what I'm looking for. I want a new column in the data set that counts attacks that targeted either Government OR Police OR Military, but not attacks on private property. So I want to write a program that look in the column "targettype_txt" for "Police" "Government" or "Military" and adds up the instances of these attacks in each country-year set. So for Nigeria in 2000 there would be (for example) 30 attacks on these Public building even if there were 100 attacks total (with the other 70 being on private property). – taraamcl Jan 27 '23 at 01:50
  • Thank you for coming back! This strategy definitely makes sense. I was able to add the column with the count condition and I see the T/F values as expected. I'm having trouble with the second line though-- I'm getting an error that ends with "ValueError: Expected a 1D array, got an array with shape (750, 11)". I'm super new to coding so this might be something on my end. Any tips would be appreciated! thanks again!! – taraamcl Jan 27 '23 at 02:33
  • Sorry, my bad. I was trying to force multiple column results into a singles series. Changed it so that result feeds into a separate dataframe. Also added another single line answer that you may prefer. – Galo do Leste Jan 27 '23 at 02:38