-1

my goal is to convert a data.frame of the form:

ID 1    ID 2    Value
5        k        7
5        k        2
5        l        4
6        b        2

into a table of the form:

ID 1    k    l    b
5       7    4      
6                 2

I would then like to manipulate the data and go back to the first format again.

I tried it by using the function spread() in the library(tidyr) but do only get the following format (which is not what I want) :

ID 1    k   l   b
5       7       
5       7       
5           4   
6               2

thanks for your help

Frank
  • 66,179
  • 8
  • 96
  • 180
Walde
  • 45
  • 8
  • 3
    There's an issue with the data: combination of ID1 = 5/ID2 = k is not unique. There are two values associated with it. Where did the value "2" go? Or you are interested in max value? Also, after aggregation, there is no way to go back to the original format as you have lost some information in the process. – statespace Oct 12 '16 at 07:12
  • 1
    Concise one liner with data.table package: `library(data.table); dcast(setDT(df)[, Value[1], .(ID1, ID2)], ID1 ~ ID2)` – Colonel Beauvel Oct 12 '16 at 08:14

2 Answers2

2

It is not clear how you want to handle duplicates but here is an attempt,

library(dplyr)
library(tidyr)
df1 <- df[!duplicated(df[c('ID1', 'ID2')]),] %>% 
                                    group_by(ID1) %>% 
                                    spread(ID2, Value, fill = '')

df1
#Source: local data frame [2 x 4]
#Groups: ID1 [2]

#    ID1     b     k     l
#  <int> <chr> <chr> <chr>
#1     5           7     4
#2     6     2     

To go back to the original we need to gather, i.e.

df2 <- df1 %>% 
         gather(ID2, Value, b:l) %>% 
         filter(Value != '') 

df2
#Source: local data frame [3 x 3]
#Groups: ID1 [2]

#    ID1   ID2 Value
#  <int> <chr> <chr>
#1     6     b     2
#2     5     k     7
#3     5     l     4

However, we are missing the duplicates so we rbind them to the gathered data frame, i.e.

rbind(as.data.frame(df2), df[duplicated(df[c('ID1', 'ID2')]),])
#    ID1 ID2 Value
#1    6   b     2
#2    5   k     7
#3    5   l     4
#21   5   k     2

After your clarification, if you don't have duplicates then,

df1 <- df %>% group_by(ID1) %>% spread(ID2, Value, fill = '')

and to go back to the original

df2 <- df1 %>% gather(ID2, Value, b:l) %>% filter(Value != '')
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

One solution to your problem would be following:

x <- data.frame(ID1 = c(5,5,5,6),
                ID2 = c("k","k","l","b"),
                Value = c(7,2,4,2))

x <- dcast(x, ID1 ~ ID2, value.var = "Value", fun.aggregate = max, fill = 0)

> x
  ID1 b k l
1   5 0 7 4
2   6 2 0 0

That's given that I handled the non-unique values of ID1/ID2 combinations with max function. Move in the opposite direction would be with melt function... but there is no way we can recover values lost in aggregation:

melt(x, id.vars = "ID1", variable.name = "ID2")
statespace
  • 1,644
  • 17
  • 25
  • Sorry there was indeed an error in my data provided. please ignore the second line. the initial data.frame should be – Walde Oct 12 '16 at 07:40
  • ID 1 ID 2 Value 5 k 7 5 l 4 6 b 2 – Walde Oct 12 '16 at 07:40
  • sorry, no clue how to edit. Just without the second row in the first data frame and without the second row in the last data frame provided in the initial post. @A. Val. I assume your approach would still be valide just without the max function? – Walde Oct 12 '16 at 07:42
  • Yes, aggregation function is required only when there is collision of non-unique ID values. – statespace Oct 12 '16 at 07:46
  • 1
    @Walde to edit click on [edit below your post or on this link](http://stackoverflow.com/posts/39992353/edit) – zx8754 Oct 12 '16 at 07:56