-2

I am new to R. I discovered data frames, rownames() and colnames() and liked using them to index into arrays as I find it makes my code more readable as in mtcars["Mazda RX4","mpg"]

Now I learn that data.frames and hence rownames() are deprecated in favor of tibbles and key columns. I can't find any documentation for key columns in R tibbles. I am sure it is there, but neither google( key columns in tibbles ) and google( key columns in R tibbles ) lead me anywhere useful.

I am not trying to do anything fancy. I just want to be a good sport, play by the rules and not put deprecated concepts into my very first code. Hence the question: "Where are key columns in R tibbles documented?"

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • What happens if you type `?tibble`. Also where is it written that `data.frame` objects are deprecated? Is this specific to `dplyr` and `tibble` or R? – NelsonGon Mar 09 '19 at 17:43
  • Please read this about tibbles. `tibbles` are data.frame objects with different printing. https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html – NelsonGon Mar 09 '19 at 17:47
  • Look at `?tbl_df` for a one minute read. – NelsonGon Mar 09 '19 at 17:51
  • 1
    Also isn't `key` specific to `data.table`?! – NelsonGon Mar 09 '19 at 17:53
  • Are you mistaking `tibble` with [keys in package `data.table`](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-keys-fast-subset.html)? – Rui Barradas Mar 09 '19 at 17:53
  • 2
    data frames *are not* deprecated. They are part of the core R base and are a stable and reliable tabular data store. "Tibbles" are part of an optional third-party add-on package that **is not supported by R Core** and needs to be installed before you can use them. – Spacedman Mar 09 '19 at 18:06
  • 2
    Once this gets closed, should it also be deleted? The statements by the OP are factually wrong and I don't see value in propagating *fake news*? – Chase Mar 09 '19 at 18:10
  • When I followed NelsonGon's advice to enter ?tbl_df I was given two options. When I followed the dplyr option, I got this message which seemed to indicate that data frames are deprecated. tbl_df {dplyr} R Documentation Create a data frame tbl. Description Deprecated: please use tibble::as_tibble() instead. Usage tbl_df(data) Arguments data a data frame – user3135871 Mar 10 '19 at 19:25

1 Answers1

3

data.frames are not depreciated and remain the workhorse of much work in R. With the emergence of "big data" and larger datasets, data.table have become quite useful. Their key advantage is that they have an underlying sorted index, which allows searching by key values and joining (merging) datasets together to be much more efficient and faster. tibbles are helper functions that inherit from data.frame (an affirmation that data.frames are not depricated since tibbles are in fact special forms of data frames).

To illustrate this:

df <- data.frame(a=runif(5),b=runif(5))
tbl <- tibble(a=runif(5),b=runif(5))

The calls to "class" yield:

> class(df)
[1] "data.frame"

> class(tbl)
[1] "tbl_df"     "tbl"        "data.frame"

This demonstrates that a tibble is itself a data.frame, and therefore any class-related functions will work on tibbles as data.frames (generally, but not always, with a nod to @Spacedman's important clarifications in the comments). The reasons for tibbles are explained in this article: https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html which essentially make them print in a more friendly way on-screen (which is generally irrelevant for embedded/finalized code) and to make some behaviors more consistent; but it may also be argued which behaviors one seeks and prefers if this is a benefit or not.

From the documentation: "Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating (i.e. converting character vectors to factors)." In summary, they are data.frames that offer some convenient shortcuts. (personally, I don't see an incentive to use them in my code--since so many base-R functions and the myriad of libraries out there will return data.frames its certain to need to use data.frames. Which means a preference for tibbles implies intentionally converting data.frames created in other sources into tibbles intentionally and having to manage both in the mental space. For me, it's a lot of overhead to achieve some presumed shortcuts).

For the other aspect of your question, you can use attributes() to see what variables are accessible in the object:

attributes(tbl)
$names
[1] "a" "b"

$row.names
[1] 1 2 3 4 5

$class
[1] "tbl_df"     "tbl"        "data.frame"
Soren
  • 1,792
  • 1
  • 13
  • 16
  • 2
    Note tibbles may have the data.frame class but they are *not* drop-in replacements for true data.frames. Code designed for data frames can, and does, break when fed a tibble. Use at your own risk. – Spacedman Mar 09 '19 at 18:08
  • I am delighted to hear that data.frames are not deprecated in R. I drew that conclusion from: https://rdrr.io/cran/tibble/man/deprecated.html – user3135871 Mar 10 '19 at 18:58
  • Thank you for all the comments. They do help me to understand data.frames, tibbles and R. – user3135871 Mar 10 '19 at 19:00
  • 1
    The link you referenced is specific to what's deprecated from the tibble library and it marks and evolution of how _that_ package is evolving. But doesn't affect other libraries (data.frame in this case). – Soren Mar 10 '19 at 19:02