8

I use mainly tables in the tibble fromat from tidyverse, but for some steps, I use the data.table package. I want to see what is the best way of converting a data.table back to tibble?

I understand that data.table has some clever function setDT and setDF function, that convert from data.frame to data.table (and vice-versa) by reference, i.e. without making a copy.

But what if I wanted to convert back to tibble? Am I copying the data using as_tibble on the data.frame resulting from setDT()? Is there a clever way to use this, maybe using the setattr() from data.table?

library(data.table)
library(tidyverse)

iris_tib <- as_tibble(iris)

## some data.table operation
setDT(iris_tib)
setkey(iris_tib, Species)
iris_tib[, Sepal.Length.Mean := mean(Sepal.Length), by = Species]



## How to convert back to tibble efficiently?
setDF(iris_tib)
iris_tib_back <-  as_tibble(iris_tib)

## it looks like we were able to update by reference? Only rownames were (shallow) copied?
changes(iris_tib, iris_tib_back)
Matifou
  • 7,968
  • 3
  • 47
  • 52
  • 4
    Your "some data.table operation" is very straightforward in the tidyverse right? Hadley has an interface to data.table that retains dplyr syntax if that's your preference: https://github.com/hadley/dtplyr Re your main question, maybe this answers it? (I have not tested.) https://github.com/Rdatatable/data.table/issues/1877#issuecomment-253864899 – Frank Sep 20 '18 at 19:12
  • 1
    two great refs, thanks! The first is interesting and the second is... I think pretty much the answer I was looking for indeed! :-) – Matifou Sep 20 '18 at 19:38
  • 2
    Ok, cool :) You can self-answer it (I won't answer since I don't have tibble installed and don't know how to confirm that the setattr trick achieves the desired result) – Frank Sep 20 '18 at 20:28
  • 1
    The issue is that I don't know myself that well how to ascertain result is correct haha, not sure to understand fully what `as_tibble()` does with the row.names. But I guess if I don't care too much about rownames, that should be fine – Matifou Sep 21 '18 at 18:42

1 Answers1

6

As @Frank mentioned, this was discussed in a post here. One possibility is to use the setattr() function, which set attributes by reference. Precisely:

setattr(x, "class", c("tbl", "tbl_df", "data.frame"))

And if there's a doubt about the original class:

old_class <- class(iris_tib)
setDT(iris_tib)
.... # bunch of data.table operatios
setDF(iris_tib)
setattr(iris_tib, "class", old_class)

This seems to do the necessary job converting back to a tibble.

Matifou
  • 7,968
  • 3
  • 47
  • 52