0

I am severly struggling with a data-rearrangement problem. The data below contains agreements (rows) which collapsed or were stable (column "collapse") and feature provisions which were reduced, kept, added or absent (columns "diff.pps_leadership","diff.pps_cabinet", etc.)

I want to rearrange the data so that I get an overview of how many % of those agreements which reduced, kept, or added a specific provision collapsed. The rows should be the provisions (diff.pps_leadership...), the columns should be "reduced, "kept", and "added". And the content of the cells should be the % of those collapsed (only in relation to those which reduced, kept, or added the provision; not the total).

In Excle I would do this in pivot table, but I haven't been able to get there with R. I tried the cast, aggregate, melt, and transpose commands, but haven't succeeded.

Eventually, the result should look similar to this https://docs.google.com/spreadsheets/d/1yhIbvTQTYkkwSFVxWEnPwvSvwTc0vuTYZxa15Eh1lT8/edit?usp=sharing

Hope my question is not too specific. Grateful for any hint/advice.

example <- structure(list(Agreement = structure(c(8L, 4L, 6L, 9L, 2L, 3L, 
7L, 10L, 5L, 1L), .Label = c("Abuja Agreement", "Accra Peace Agreement", 
"Arusha Agreement", "Arusha/Global Ceasefire Agreement", "Comprehensive Peace Agreement", 
"InterabsentCongolese Dialogue", "Lome Agreement", "Lusaka Protocol", 
"Ouagadougou Agreement", "Tansitional Constituion"), class = "factor"), 
    diff.pps_cabinet = structure(c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L), .Label = c("kept", "reduced"), class = "factor"), 
    diff.pps_leadership = structure(c(1L, 2L, 3L, 3L, 3L, 3L, 
    3L, 3L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.mps_milcmd = structure(c(3L, 2L, 3L, 3L, 3L, 3L, 1L, 
    3L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.mps_armyint = structure(c(3L, 2L, 2L, 3L, 3L, 3L, 1L, 
    3L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.eps_commission = structure(c(1L, 1L, 1L, 1L, 3L, 1L, 
    3L, 1L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.eps_company = structure(c(1L, 2L, 1L, 1L, 3L, 1L, 1L, 
    1L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.veto_leg = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = c("absent", "added"), class = "factor"), 
    diff.tps_devolution = structure(c(2L, 1L, 2L, 3L, 1L, 1L, 
    1L, 2L, 2L, 1L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.ca.psh = structure(c(3L, 2L, 1L, 1L, 4L, 1L, 1L, 1L, 
    4L, 1L), .Label = c("absent", "added", "kept", "reduced"), class = "factor"), 
    collapse = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 
    1L), .Label = c("collapse", "stable"), class = "factor")), .Names = c("Agreement", 
"diff.pps_cabinet", "diff.pps_leadership", "diff.mps_milcmd", 
"diff.mps_armyint", "diff.eps_commission", "diff.eps_company", 
"diff.veto_leg", "diff.tps_devolution", "diff.ca.psh", "collapse"
), class = "data.frame", row.names = c(NA, -10L))
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
zoowalk
  • 2,018
  • 20
  • 33

1 Answers1

1

The following gets the job done.

library(data.table)
setDT(example)

mvs <- c("diff.pps_cabinet", "diff.pps_leadership", 
         "diff.mps_milcmd", "diff.mps_armyint")

vls <- c("reduced", "kept", "added", "absent")

melt(example, c("Agreement", "collapse"), mvs
     )[ , setNames(vapply(
       vls, function(vv) list(paste0(
         s <- sum(collapse[idx <- value == vv] == "collapse"), 
         " out of ", sum(idx), " = ", floor(100 * s / sum(idx)), "% collapsed"),
         paste(Agreement[idx], collapse = "\n")),
       vector("list", 2)),
       paste0(rep(vls, each = 2),
              c(".percent", ".names"))), by = variable]

Current prints NaN when there's nothing; to fix this, replace sum(idx) in the denominator by (if (!any(idx)) 1 else sum(idx)).

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • thank you very much for your efforts! this comes already very close to what i am looking for. Unfortunately, the percentages and number of observations in the cells are not what i am looking for. What e.g. in the cell diff.pps_cabinet /reduced.percent is now "9 out of 10" should be "5 out of 9". 9 (out of all 10) reduce and out of those 5 collapse. – zoowalk Jan 06 '16 at 18:52
  • Excellent, many thanks. The only remaining issue is that I want to have only the names of those agreements which collapse, and not all. If I understand correctly this relates to the expression paste(Agreement[idx], collapse = "\n") and requires a condition filtering out only those that collapse. Thought Agreement[idx <- value == vv] == "collapse", collapse = "\n") might be the way forward but unfortunately not working. Apologies if my initial request was not clear enough. – zoowalk Jan 07 '16 at 10:10
  • at this point, the fix for that problem should be clear. I suggest you keep trying. – MichaelChirico Jan 07 '16 at 12:48