4

RStudio provides a nice function View (with uppercase V) to take a look into the data, but with R it's still nasty to get orientation in a large data set. The most common options are...

  • names(df)
  • str(df)

If you're coming from SPSS, R seems like a downgrade in this respect. I wondered whether there is a more user-friendly option? I did not find a ready-one, so I'd like to share my solution with you.

BurninLeo
  • 4,240
  • 4
  • 39
  • 56
  • 1
    There's also `tibble::glimpse`. – Axeman Oct 13 '16 at 13:39
  • I did not know that, yet. Thanks! It looks a bit like a compressed version of `str`, but does not show comments (in SPSS this would be "Variable Labels"), does it? – BurninLeo Oct 13 '16 at 13:42
  • I don't think so, but this might be because comments aren't used by most people (as far as I know). I had never heard of that function in R, actually! – Axeman Oct 13 '16 at 13:53
  • 1
    Oh, `comment()` is incredibly helpful. Especially, if multiple people work with the same data. Probably it's a question of individual R-style :) – BurninLeo Oct 13 '16 at 13:59
  • 3
    you might be interested in looking at `describe`, `label`, `units` from the `Hmisc` package ... – Ben Bolker Oct 13 '16 at 13:59

1 Answers1

5

Using RStudio's built-in function View, it's white simple to have a variable listing for a data.frame similar to the one in SPSS. This function creates a new data.frame with the variable information and displays in the RStudio GUI via View.

# Better variables view
Varlist = function(sia) {
  # Init varlist output
  varlist = data.frame(row.names = names(sia))
  varlist[["comment"]] = NA
  varlist[["type"]] = NA
  varlist[["values"]] = NA
  varlist[["NAs"]] = NA
  # Fill with meta information
  for (var in names(sia)) {
    if (!is.null(comment(sia[[var]]))) {
        varlist[[var, "comment"]] = comment(sia[[var]])
    }
    varlist[[var, "NAs"]] = sum(is.na(sia[[var]]))
    if (is.factor(sia[[var]])) {
      varlist[[var, "type"]] = "factor"
      varlist[[var, "values"]] = paste(levels(sia[[var]]), collapse=", ")
    } else if (is.character(sia[[var]])) {
      varlist[[var, "type"]] = "character"
    } else if (is.logical(sia[[var]])) {
      varlist[[var, "type"]] = "logical"
      n = sum(!is.na(sia[[var]]))
      if (n > 0) {
        varlist[[var, "values"]] = paste(round(sum(sia[[var]], na.rm=T) / n * 100), "% TRUE", sep="")
      }
    } else if (is.numeric(sia[[var]])) {
      varlist[[var, "type"]] = typeof(sia[[var]])
      n = sum(!is.na(sia[[var]]))
      if (n > 0) {
        varlist[[var, "values"]] = paste(min(sia[[var]], na.rm=T), "...", max(sia[[var]], na.rm=T))
      }
    } else {
      varlist[[var, "type"]] = typeof(sia[[var]])
    }
  }
  View(varlist)
}

My recommendation is to store that as a file (e.g., Varlist.R) and whever you need it, just type:

source("Varlist.R")
Varlist(df)

Again please take note of the uppercase V using as function name.

Limitation: When working with data.frame, the listing will not be updated unless Varlist(df) is run again.

Note: R has a built-in option to view data with print. If working with pure R, just replace the View(varlist) by print(varlist). Yet, depending on screen size, Hmisc::describe() could be a better option for the console.

BurninLeo
  • 4,240
  • 4
  • 39
  • 56