Aggregating count data

Question

I have a dataset (test_data) on civil conflicts between 1989-2016. The unit of analysis is _DyadID_, which is the unique identifier for each pair of actors involved in civil conflict in this time period. The dataset also includes _SideA_ and _SideB_ which are the names of the actors in a specific dyad. Each row is an "event" of armed violence, in which there is a variable for the number of side A deaths (_deaths-a_) and number of Side B deaths (_deaths-b_). Lastly, there is a variable indicating the month-year of each event.

subset of data

For my research, I need to know the number of _deaths-a_ and number of _deaths-b_ per month. Basically, I want to end up with a dataset that shows me monthly data on death counts for each _DyadID_. I have managed to show total number of A/B deaths per month across all conflicts using the following code:

    monthly_deaths_a <- aggregate(deaths_a ~ year_month, test_data, sum)
    monthly_deaths_b <- aggregate(deaths_b ~ year_month, test_data, sum)

but don't know how to get this data disaggregated for each dyad.

If anyone could suggest a way of doing this I would be most grateful! Cheers

Mako212 · Answer 1 · 2017-07-21T17:07:58.653

1

With data.table, something like this:

require(data.table)

summary <- test_data[, .(sum(deaths_a), sum(deaths_b)), by= .(year_month)]

edited Jul 21 '17 at 17:07

answered Jul 21 '17 at 17:00

Mako212

6,787
1
18
37

pyll · Accepted Answer · 2017-07-21T18:18:24.477

0

Note: Not sure if you need by month or by month AND year...these are different, so I separated the two...

   #Start with some sample data 

other_var <- c(1,2,2,1,2,2)
DyadID <- c(689, 689, 689, 889, 889, 889)
year_month <- c('2007-04', '2007-04', '2008-04', '2007-06', '2007-06', '2007-07')
deaths_a <- c(0, 5, 3, 2, 0, 0)
deaths_b <- c(10, 0, 3, 4, 3, 3)

df <- data.frame(other_var, DyadID, year_month, deaths_a, deaths_b)

#Use the dplyr and tidyr packages...

library(dplyr)
library(tidyr)

#Split your variable into year and month (which is what I assume you mean...)

df <- df %>%
  separate(year_month, c('year', 'month'), "-")

#Aggregate

df2 <- aggregate(cbind(deaths_a, deaths_b) ~ DyadID + year + month, df, sum)

edited Jul 21 '17 at 18:18

answered Jul 21 '17 at 17:01

pyll

1,688
1
26
44

Excellent! That worked perfectly, thank you very much. – Lee Tagziria Jul 21 '17 at 17:27
Hi again! I have tried to reproduce the same code, but for the more complex version of my data, which has in total 42 variable. df <- data.frame(brd_ged$DyadID, brd_ged$ConflictID, brd_ged$year_month, brd_ged$LocationInc, brd_ged$SideA, brd_ged$SideA2nd.. etc) df2 <- aggregate(.~brd_ged.DyadID+brd_ged.year_month, df, sum) But I get "error: no rows to aggregate". I can see why, because it doesn't know which two variables I want to find the sum of i.e. deaths_a and deaths_b, but how do I specify this in the formula? – Lee Tagziria Jul 21 '17 at 18:10
`aggregate(.~brd_ged.DyadID+brd_ged.year_month, df, sum)` change to `aggregate(.~brd_ged$DyadID+brd_ged$year_month, df, sum)` – pyll Jul 21 '17 at 18:12
this will aggregate ALL other columns...so if you only want those two columns aggregated, you should create a subset before you aggregate `df <- brd_ged[c('DyadID', 'year_month', 'deaths_a', 'deaths_b')]` – pyll Jul 21 '17 at 18:13
So essentially I could just stick with the original solution you provided, and then merge the resulting dataframe with my full, complex dataset to get the results I am looking for? – Lee Tagziria Jul 21 '17 at 18:17
check out my updated solution....you can use `cbind` with the variables you want...notice that the variable `other_var` is now ignored. – pyll Jul 21 '17 at 18:19
Excellent. I did that, and then merged using: model_data <- full_join(brd_ged, df2, b = c("DyadID", "year_month", "deaths_a", "deaths_b")) and it worked a treat. Cheers mate! – Lee Tagziria Jul 21 '17 at 18:34

Aggregating count data

2 Answers2