3

I am working with different models that all have different parameters. It is convenient for me to store them in a database. When I pull them, they come in the form of a dataframe that I call df.

In df there are several columns that help differentiate each parameter from one another so that each (entire) row is ultimately unique.

For example

col_1 <- c("model_1", "model_1", "model_1", "model_1", "model_2", "model_2", "model_2", "model_2")
col_2 <- c("category_1", "category_1", "category_2", "category_2", "category_1", "category_1", "category_2", "category_2")
col_3 <- c("type_1", "type_2", "type_1", "type_2", "type_1", "type_2", "type_1", "type_2")
col_4 <- c("name_1", "name_2", "name_3", "name_4", "name_5", "name_6", "name_7", "name_8")
col_5 <- c("value_1", "value_2", "value_3", "value_4", "value_5", "value_6", "value_7", "value_8")
mat <- matrix(c(col_1, col_2, col_3, col_4, col_5), ncol = 5)
df <- data.frame(mat)
names(df) <- c("model", "category", "type", "name", "value")

I would be interested in transforming df into a list of list of list ... - call it deep_list - so that each parameter value could be accessed like

parameter <- deep_list$model_1$category_2$type_2$name_4

and it should give me value_4.

I've been reading this thread Converting a data.frame to a list of lists and tried to make the best use of the dlply() function from {plyr} as

not_deep_list <- dlply(df,1,c)

or also

not_list <- df %>% group_by(model)

I reckon this is a very similar problem (hence the similar title).

However it is different in the sense that it requires to treat more "layers" (i.e. the columns) of information hence the deep_list name and the title...

Any suggestion is welcomed (recursions, loops, vectorized solutions, functions-from-packages-I-never-heard-of, ...)

Thanks !

Jaap
  • 81,064
  • 34
  • 182
  • 193
maraboule
  • 363
  • 3
  • 12

1 Answers1

4

First, I specified stringsAsFactors=FALSE in your data.frame - this is important since I use split(...) which will recognize levels of factors rather than the factor-values. To see what I mean, run

vec <- factor(c("apple"), levels=c("apple","banana"))
split(vec, vec)

# $apple
# [1] apple
# Levels: apple banana
# $banana
# factor(0)
# Levels: apple banana

Ok - so specifying strings as not-factors

df <- data.frame(mat, stringsAsFactors=FALSE)

Try this custom function - it is recursive, calling itself if the length(split(..., ...)) > 1) - i.e., if the split(...) of a data.frame column results in > 1 group, the function will call itself using as new argument i[,-1].

recursive_split <- function(L) {
    L1 <- split(L, L[,1])
    if (length(L1) == 1) {
        L2 <- lapply(L1, function(i) i[,-1])
        return(L2)
    } else {
        lapply(L1, function(i) recursive_split(i[,-1])) 
    }
}

deep_list <- recursive_split(df)

# $model_1
# $model_1$category_1
# $model_1$category_1$type_1
# $model_1$category_1$type_1$name_1
# [1] "value_1"

# $model_1$category_1$type_2
# $model_1$category_1$type_2$name_2
# [1] "value_2"

# $model_1$category_2
# $model_1$category_2$type_1
# $model_1$category_2$type_1$name_3
# [1] "value_3"
# etc

deep_list$model_1$category_2$type_2$name_4
# [1] "value_4"
CPak
  • 13,260
  • 3
  • 30
  • 48
  • Thanks for your answer @CPak. However, it seems that the levels of the original data.frame are kept in the sub-layers of the list. It causes problems when I apply this to my actual data.frame because I have empty sub-lists. To avoid that I added this as a first line to your function `L <- data.frame(as.matrix(L), stringsAsFactors = FALSE)` and it seems to do the trick – maraboule May 10 '18 at 10:40
  • But I love recursive functions so I'm just going to accept that answer ;) – maraboule May 10 '18 at 10:51
  • @maraboule - in the future, try to post an example that mimics your real data as much as possible - potential answers are designed with the example data provided in mind – CPak May 10 '18 at 12:57