11

All, I am a beginner in R. I am not too familiar with how classes are organized in R. I have noticed that some class() calls return one class-type, while others return multiple class names.

Example 1

{My object name is "sassign"} Here's my data:

 acctnum gender state   zip zip3 first last book_ nonbook_ total_ purch child youth cook do_it refernce art geog buyer
1   10001      M    NY 10605  106    49   29   109      248    357    10     3     2    2     0        1   0    2    no
2   10002      M    NY 10960  109    39   27    35      103    138     3     0     1    0     1        0   0    1    no
3   10003      F    PA 19146  191    19   15    25      147    172     2     0     0    2     0        0   0    0    no
4   10004      F    NJ 07016  070     7    7    15      257    272     1     0     0    0     0        1   0    0    no
5   10005      F    NY 10804  108    15   15    15      134    149     1     0     0    1     0        0   0    0    no
6   10006      F    NY 11366  113     7    7    15       98    113     1     0     1    0     0        0   0    0   yes

Now, if I do class(object) above, I get:

class(sassign)
[1] "data.frame"

I am good with this. I understand that this data structure is of type data frame.

Example 2 Now, I recently came across Wickham's tibbleR package. Here's how I converted data frame to Tibble:

tib_sassign<-as_data_frame(sassign)
class(tib_sassign)
[1] "tbl_df"     "tbl"        "data.frame"

This is where I was lost. I do not know the differences between tbl_df and tbl. However, my hypothesis is that Tibble package makes our life easier by returning objects (similar to abstract classes) that can be used as a tibble ("tbl"), data frame ("data.frame") or tbl_df (I have no clue what tbl_df means). I read through dplyr package's online pdf, but I don't think they have explained this. I believe they assume that people know what above would mean.

I read RStudio's blog on https://blog.rstudio.org/2016/03/24/tibble-1-0-0/ but I dont think they have described what the above output means. I also read Norman Matloff's book, but I don't think this is covered. I also googled "tbl_df" "tbl" "data.frame", but most of the results were pertaining to some piece of code not working. I couldn't find an explanation of what above output means.

Example 3 I have now started to look at Time Series in R. This is where I got to a point that I have to start this thread. Here's what I did:

t_sassign <-data.frame(group_by(sassign,last))
t_sassign<-ts(t_sassign,start = c(2014,1),frequency = 12)
class(t_sassign)
[1] "mts"    "ts"     "matrix"

Here, "last" is the # of months. While I do believe I will somehow manage what I need to do, but I still don't get what the above result means.

I also searched through StackOverflow, but most of the results talk about returning Class in JAVA.

I have three questions:

Question 1) It will be awesome if someone could provide an example so that I can understand the output from class()

Question 2) I'd also appreciate if someone could provide a snippet with an application of concept discussed in question 1. This way, I can register this concept in my brain forever.

Question 3) If you know a book that goes into such concepts, I'd appreciate it. I am following R in Action by Kabackoff, R by Norman Matloff and StackOverflow.

Many thanks in advance for your help.


(Added) Here's another confusing thing: When I did:

AP<-AirPassengers
class(AP)
[1] "ts"

I got "ts" as class type. Inherited classes were not shown. I am really lost. Please help me!

watchtower
  • 4,140
  • 14
  • 50
  • 92
  • 1
    I think this is simply off topic here because this is a broad question about a fundamental concept of R; however, you can read up on what `class` returns here: http://adv-r.had.co.nz/S3.html – Konrad Rudolph Aug 05 '16 at 17:57
  • Not a bad question though. Incomprehensible downvote. – Konrad Rudolph Aug 05 '16 at 18:04
  • Thank you so much, Konrad. I did go through the link you have posted. However, I believe Wickham focuses more on threads and Nesting. I am not quite sure whether he talks about the three classes in class() function. Please correct me if I am wrong. I have gone through tonnes of material on the web, but I couldn't find any resource. If you can explain it, I'd appreciate it. – watchtower Aug 05 '16 at 18:05
  • You must have looked at the wrong link! — The link I posted doesn’t mention threading and nesting (?) at all. It does talk about the `class` function and explains what it means. – Konrad Rudolph Aug 05 '16 at 18:07
  • 2
    The idea behind the `class` function in R is to offer an object-oriented style of programming. Calling the `class` function on an R object returns all the classes that object is made of (i.e. its immediate class along with all the classes that immediate class `inherits` from). For instance, when you coerce a normal `data.frame` (a native R object) into a `tbl_df`, you are changing its immediate class from `data.frame` to `tbl_df`. You can check if an R object inherits from a class by using the `inherits` function. – Abdou Aug 05 '16 at 18:07
  • 1
    For example: `inherits(as_data_frame(sassign), what = 'data.frame')` should return `TRUE`, just like `inherits(ts(sassign), what = 'matrix')` should return `TRUE` too. – Abdou Aug 05 '16 at 18:07
  • Thanks Abdou. So, if I understand you correctly, here's how the inheritance is done: "tbl_df" -> "tbl" -> "data.frame"? I searched for the inheritance diagram but I couldn't find it. So, I thought of asking you about it. – watchtower Aug 05 '16 at 18:31
  • 1
    In that case, yes. `data.frame` is a native R object, on top of which `tbl` and `tbl_df` are built. I did not design the `dplyr` package, but I am going to go out on a limb and say that the idea behind it was to provide an object that has the basic properties of a `data.frame` with additional enhancements. Both `tbl_df` and `data.table` inherit from `data.frame`. `time-series` objects inherit from `matrix` object. – Abdou Aug 05 '16 at 18:39
  • 1
    Consider [polymorphism](https://en.wikipedia.org/wiki/Polymorphism_(computer_science)). For example, when you dump an object to the console, R first looks for `print.class` where `.class` is the appropriate class. In your 2nd example, it finds `print.tbl_df` and prints something less screen-hogging. However, there may be other polymorphic functions that don't know about `tbl_df`, so R then looks for (say) `print.tbl` and `print.data.frame`, using the first it finds. If nothing is found, it uses `print.default`. By having multiple classes, all will be searched for appropriate methods. – r2evans Aug 05 '16 at 18:50

1 Answers1

4

This isn't something from base R but rather a feature of what is often referred to as the 'hadleyverse'. Hadley has designed the dplyr package to work with a special version of dataframes. See: http://www.rdocumentation.org/packages/tibble/versions/1.1/topics/tibble-package for a description of the tbl_df class. That class has versions of print, "[", and "[[" that differ from those functions from base-R that would normally handle dataframes as described there. Different printing format and rules, $ and [[ never do partial name matching, and subsetting always returns a data.frame.

Re: a separate description for the tbl-class. What I have found so far suggests to me that dplyr-package docs are the place to look, since it has as.tbl and descriptions of difference methods for different kinds of data-sources such as SQL servers.

A correction. That package is NOT named tibbleR

For you last question (noting that multipart questions are frowned on in SO) You can see that ?inherits will sometimes but not always tell you if an objects= is a member of an "implicit" class and that you may need to use an is- function to test for 'numeric':

> AP<-AirPassengers
> class(AP)
[1] "ts"
> inherits(AP, "matrix")
[1] FALSE
> inherits(AP, "numeric")
[1] FALSE
> str(AP)
 Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
> inherits( as.matrix(AP), "numeric")
[1] FALSE
> inherits( as.matrix(AP), "matrix")
[1] TRUE
> str( as.matrix(AP) )
 num [1:144, 1] 112 118 132 129 121 135 148 148 136 119 ...
> inherits( as.matrix(AP), "integer")
[1] FALSE
> is.numeric( as.matrix(AP) )
[1] TRUE
> ?inherits
IRTFM
  • 258,963
  • 21
  • 364
  • 487