3

I have a problem that looks like this and this but I cannot get what I am after.

I have the following sample dataset:

country_code=c('USA','USA','USA','USA','USA','USA','CHN','CHN','CHN','CHN','CHN','CHN')
target_var=c('V1','V1','V1'  ,'V2'  ,'V2' ,'V2' ,'V1' ,'V1' ,'V1','V2' ,'V2' ,'V2')
VAR=       c('X7','X8','X140','X114','X18','X28','X29','X22','X2','X22','X23','X24')
Ranking=   c(1   ,2.5 ,2.5   ,1.5   ,1.5  ,1.5  , 1   ,2    ,3   ,1.5  ,1.5  ,3)
df<-data.frame(country_code,target_var,VAR,Ranking)

And I need to convert from a long to wide format for all combinations of country_code and target_var. The twist I was referring to is that I only want to retain the top X VARs by Ranking (let's say 2 for this example), retaining the ties. So the end result of the example dataset will look like:

Please note that for USA the "ties" are retained so instead of the top 2 I get the top 3. The ties could have occurred in CHN instead.

enter image description here

I have tried with a nested loop and rbind, but I am not able to make it work. I also looked at a number of threads re long to wide but the vast majority only "reshape" numbers, not characters, which is what the VARs are. I suspect a dplyr solution is what would make sense but I can't get it work. Thank you

Community
  • 1
  • 1
J. Doe.
  • 1,255
  • 1
  • 12
  • 25
  • Try `df %>% group_by(country_code, target_var) %>% top_n(2, wt = Ranking) %>% mutate(n = row_number()) %>% select(-Ranking) %>% spread(n, VAR)` – akrun Oct 26 '16 at 09:27
  • 1
    You might want to add this as a solution....!! Thanks a bunch! – J. Doe. Oct 26 '16 at 09:36

1 Answers1

4

We can use top_n to subset the rows, and then spread from 'long' to 'wide'

library(tidyr)
df %>% 
  group_by(country_code, target_var) %>% 
  top_n(2, wt = Ranking) %>%
  mutate(n = row_number()) %>% 
  select(-Ranking) %>% 
  spread(n, VAR, sep="")
akrun
  • 874,273
  • 37
  • 540
  • 662