I am trying to convert a dataframe where each row is a specific event, and each column has information about the event. I want to turn this into data in which each row is a country and year with information about the number and characteristics about the events in the given year.In this data set, each event is an occurrence of terrorism, and I want to count the number of events where the "target" is a government building. One of the columns is called "targettype" or "targettype_txt" and there are 5 different entries in this column I want to count (government building, military, police, diplomatic building etc). The targettype is also coded as a number if that is easier (i.e. there is another column where gov't building is 2, military installation is 4 etc..)
FYI This data set has 16 countries in West Africa and is looking at years 2000-2020 with a total of roughly 8000 events recorded. The data comes from the Global Terrorism Database, and this is for a thesis/independent research project (i.e. not a graded class assignment).
Right now my data looks like this (there are a ton of other columns but they aren't important for this):
eventID | iyear | country_txt | nkill | nwounded | nhostages | targettype_txt |
---|---|---|---|---|---|---|
10000102 | 2000 | Nigeria | 3 | 10 | 0 | government building |
10000103 | 2000 | Mali | 1 | 3 | 15 | military installation |
10000103 | 2000 | Nigeria | 15 | 0 | 0 | government building |
10000103 | 2001 | Benin | 1 | 0 | 0 | police |
10000103 | 2001 | Nigeria | 1 | 3 | 15 | private business |
. . .
And I would like it to look like this:
country_txt | iyear | total_nkill | total_nwounded | total_nhostages | total public_target |
---|---|---|---|---|---|
Nigeria | 2000 | 200 | 300 | 300 | 15 |
Nigeria | 2001 | 250 | 450 | 15 | 17 |
I was able to get the total number for nkill,nwounded, and nhostages using this super simple line:
df2 = cdf.groupby(['country','country_txt', 'iyear'])['nkill', 'nwound','nhostkid'].sum()
But this is a little different because I want to only count certain entries and sum up the total number of times they occur. Any thoughts or suggestions are really appreciated!