3

I have a data frame like this

  id key value
1  x   a     1
2  x   b     2
3  y   a     3
4  y   b     4

read.table(text = "id   key value
x   a   1
x   b   2
y   a   3
y   b   4", header = TRUE, sep = "\t")

And I would like to get a list for each id with sub lists for each key

So with my example the expected output would be :

$x
$x$a
$x$a$value
[1] 1

$x$b
$x$b$value
[1] 2

$y
$y$a
$y$a$value
[1] 3

$y$b
$y$b$value
[1] 4

list(
  x = list(
    a = list(value = 1), 
    b = list(value = 2)
  ), 
  y = list(
    a = list(value = 3), 
    b = list(value = 4)
  )
)

I can achieve it with nested lapply and split but I think there should be a more straightforward way to do it.

Any help would be appreciated.

Julien Navarre
  • 7,653
  • 3
  • 42
  • 69
  • Generally, you could approach this with a recursive `split`/`lapply`. E.g. (at least here): `ff = function(x) if(is.data.frame(x)) lapply(split(x[, -1], x[[1]]), ff) else x; ff(dat); ff(cbind(dat[1:2], "value", dat[3]))` seems to work – alexis_laz Mar 02 '18 at 14:31
  • Thanks for the tip, it confirms me that there definitely should be a nice `purrr` solution – Julien Navarre Mar 02 '18 at 14:33
  • How would you do if I have an additionnal column `value2` which should be at the same list level than `value` : e.g. `list(x = list(a = list(value1 = 1, value2 = 2)))` – Julien Navarre Mar 02 '18 at 15:02
  • You could try specifying as argument the number of columns to recurse over: `ff = function(x, nrec) if(nrec) lapply(split(x[-1], x[[1]]), ff, nrec - 1L) else as.list(x); ff(dat, 2); dat2 = cbind(dat, value2 = c(10, 20, 30, 40)); ff(dat2, 2)`. I'm not sure how generalizable this could be though – alexis_laz Mar 03 '18 at 09:24

2 Answers2

1

Overview

Two methods - one using base and the other using plyr - to split your data frame by a group, apply a function over each group, and return the results in a list.

Use base::split.data.frame() followed by an lapply() to extract the value element for each unique id-key pair.

# split data frame
# based on 'id' and 'key' pairs
df.split <-
    split.data.frame(
        x = df
        , f = list( df$id, df$key )
    )
# keep only the value
# element within each list
df.split <-
    lapply(
        X = df.split
        , FUN = function( i )
            i[["value"]]
    )

# view results
df.split
# $x.a
# [1] 1
# 
# $y.a
# [1] 3
# 
# $x.b
# [1] 2
# 
# $y.b
# [1] 4

# end of script #

Use plyr::dlply() to do the same thing, without the need for lapply().

# load necessary packages
library( plyr )

# splits df by the 'id' and 'key' variables
# and return the 'value' for each pairing
df.split <-
    dlply( 
        .data = df
        , .variables = c( "id", "key" )
        , .fun = function(i) i[["value"]]
    )

# view results
df.split
# $x.a
# [1] 1
# 
# $x.b
# [1] 2
# 
# $y.a
# [1] 3
# 
# $y.b
# [1] 4
# 
# attr(,"split_type")
# [1] "data.frame"
# attr(,"split_labels")
# id key
# 1  x   a
# 2  x   b
# 3  y   a
# 4  y   b

# end of script #

@Colonel Beauvel's answer to the SO post Emulate split() with dplyr group_by: return a list of data frames was helpful in answering this question.

Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33
0

One solution with limited number of split and nested *apply :

lapply(split(df, df$id), function(x) setNames(apply(x, 1L, function(x) as.list(x["value"])), x[["key"]]))

Nested lapply and split alternative :

lapply(split(df, df$id), function(x) lapply(split(x["value"], x$key), as.list))

Improvments are welcome !

Julien Navarre
  • 7,653
  • 3
  • 42
  • 69