How to count text event type and transform it into country-year data using pandas?

Question

I am trying to convert a dataframe where each row is a specific event, and each column has information about the event. I want to turn this into data in which each row is a country and year with information about the number and characteristics about the events in the given year.In this data set, each event is an occurrence of terrorism, and I want to count the number of events where the "target" is a government building. One of the columns is called "targettype" or "targettype_txt" and there are 5 different entries in this column I want to count (government building, military, police, diplomatic building etc). The targettype is also coded as a number if that is easier (i.e. there is another column where gov't building is 2, military installation is 4 etc..)

FYI This data set has 16 countries in West Africa and is looking at years 2000-2020 with a total of roughly 8000 events recorded. The data comes from the Global Terrorism Database, and this is for a thesis/independent research project (i.e. not a graded class assignment).

Right now my data looks like this (there are a ton of other columns but they aren't important for this):

eventID	iyear	country_txt	nkill	nwounded	nhostages	targettype_txt
10000102	2000	Nigeria	3	10	0	government building
10000103	2000	Mali	1	3	15	military installation
10000103	2000	Nigeria	15	0	0	government building
10000103	2001	Benin	1	0	0	police
10000103	2001	Nigeria	1	3	15	private business

. . .

And I would like it to look like this:

country_txt	iyear	total_nkill	total_nwounded	total_nhostages	total public_target
Nigeria	2000	200	300	300	15
Nigeria	2001	250	450	15	17

I was able to get the total number for nkill,nwounded, and nhostages using this super simple line:

df2 = cdf.groupby(['country','country_txt', 'iyear'])['nkill', 'nwound','nhostkid'].sum()

But this is a little different because I want to only count certain entries and sum up the total number of times they occur. Any thoughts or suggestions are really appreciated!

Galo do Leste · Accepted Answer · 2023-01-27T02:46:54.663

0

Try:

cdf['CountCondition'] = (cdf['targettype_txt']=='government building') | 
    (cdf['targettype_txt']=='military installation') | 
    (cdf['targettype_txt']=='police')
df2 = cdf[cdf['CountCondition']].groupby(['country','country_txt', 'iyear', 'CountCondition']).count()

You create a new column 'CountCondition' which just marks as true or false if the condition in the statement holds. Then you just count the number of times the CountCondition is True. Hope this makes sense.

It is possible to combine all this into one statement and NOT create an additional column but the statement gets quite convaluted and more difficult to understand how it works:

df2 = cdf[(cdf['targettype_txt']=='government building') | 
    (cdf['targettype_txt']=='military installation') | 
    (cdf['targettype_txt']=='police')].groupby(['country','country_txt', 'iyear']).count()

edited Jan 27 '23 at 02:46

answered Jan 27 '23 at 01:40

Galo do Leste

703
5
13

This isn't quite right, but it is closer to what I'm looking for. I want a new column in the data set that counts attacks that targeted either Government OR Police OR Military, but not attacks on private property. So I want to write a program that look in the column "targettype_txt" for "Police" "Government" or "Military" and adds up the instances of these attacks in each country-year set. So for Nigeria in 2000 there would be (for example) 30 attacks on these Public building even if there were 100 attacks total (with the other 70 being on private property). – taraamcl Jan 27 '23 at 01:50
Thank you for coming back! This strategy definitely makes sense. I was able to add the column with the count condition and I see the T/F values as expected. I'm having trouble with the second line though-- I'm getting an error that ends with "ValueError: Expected a 1D array, got an array with shape (750, 11)". I'm super new to coding so this might be something on my end. Any tips would be appreciated! thanks again!! – taraamcl Jan 27 '23 at 02:33
Sorry, my bad. I was trying to force multiple column results into a singles series. Changed it so that result feeds into a separate dataframe. Also added another single line answer that you may prefer. – Galo do Leste Jan 27 '23 at 02:38

How to count text event type and transform it into country-year data using pandas?

1 Answers1