53

Can someone please help how to get the list of built-in data sets and their dependency packages?

Jaap
  • 81,064
  • 34
  • 182
  • 193
mockash
  • 1,204
  • 5
  • 14
  • 26
  • 13
    Try with `data()` – akrun Nov 19 '15 at 07:33
  • 5
    You might want `ls("package:datasets")` for the names of all "built-in" data sets in the `datasets` package. – Rich Scriven Nov 19 '15 at 07:33
  • 1
    Thanks @akrun... this worked... data() returns the data frames from the package 'datasets' and 'data(package = .packages(all.available = TRUE))' returns built-in dataframes from all packages. – mockash Nov 19 '15 at 09:07

4 Answers4

59

There are several ways to find the included datasets in R:

1: Using data() will give you a list of the datasets of all loaded packages (and not only the ones from the datasets package); the datasets are ordered by package

2: Using data(package = .packages(all.available = TRUE)) will give you a list of all datasets in the available packages on your computer (i.e. also the not-loaded ones)

3: Using data(package = "packagename") will give you the datasets of that specific package, so data(package = "plyr") will give the datasets in the plyr package


If you want to know in which package a dataset is located (e.g. the acme dataset), you can do:

dat <- as.data.frame(data(package = .packages(all.available = TRUE))$results)
dat[dat$Item=="acme", c(1,3,4)]

which gives:

    Package Item                  Title
107    boot acme Monthly Excess Returns
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • And how do I find package of a dataframe? In the sense if I know a dataframe how do I know in which package it is created. – mockash Nov 26 '15 at 09:16
  • 6
    For some datasets, you can use the 'help'-function, it shows the package the set came from. For example: '?iris'. – Heroka Nov 26 '15 at 09:19
7

I often need to also know which structure of datasets are available, so I created dataStr in my misc package.

dataStr <- function(package="datasets", ...)
  {
  d <- data(package=package, envir=new.env(), ...)$results[,"Item"]
  d <- sapply(strsplit(d, split=" ", fixed=TRUE), "[", 1)
  d <- d[order(tolower(d))]
  for(x in d){ message(x, ":  ", class(get(x))); message(str(get(x)))}
  }
dataStr()

Please mind that the output in the console is quite long.

This is the type of output:

[...]

warpbreaks:  data.frame
'data.frame':   54 obs. of  3 variables:
 $ breaks : num  26 30 54 25 70 52 51 26 67 18 ...
 $ wool   : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
 $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...

WorldPhones:  matrix
 num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
  ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...

WWWusage:  ts
 Time-Series [1:100] from 1 to 100: 88 84 85 85 84 85 83 85 88 89 ...

Edit: To get more informative output and use it for unloaded packages or all the packages on the search path, please use the revised online version with

source("https://raw.githubusercontent.com/brry/berryFunctions/master/R/dataStr.R")
Berry Boessenkool
  • 1,506
  • 11
  • 15
  • Nice, though this needs some modification if you want it to work with other packages. `dataStr("colorspace") # Error in get(x) : object 'USSouthPolygon' not found` (I see this even though `colorspace::USSouthPolygon` works.) – Frank Jul 21 '17 at 16:15
  • 2
    Fast solution: first `library(colorspace)`. Better solution is now online, but code got too long to copypaste here. https://github.com/brry/berryFunctions/blob/master/R/dataStr.R – Berry Boessenkool Jul 31 '17 at 15:08
6

Here is a comprehensive R packages datasets list maintained by Prof. Vincent Arel-Bundock. https://vincentarelbundock.github.io/Rdatasets/

Rdatasets is a collection of 1892 datasets that were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.

Ayşe Nur
  • 493
  • 7
  • 11
2

Run

help(package = "datasets")

in the R Studio console and you'll get all available datasets in the tidy Help tab on the right.

Igor Micev
  • 1,514
  • 1
  • 17
  • 23