38

When I try to answer a question in Stack Overflow about R, a good part of my time is spent trying to rebuild the data given as example (unless the question author has been nice enough to provide them as R code).

So my question is, if somebody just asks a question and gives his sample data frame the following way :

a  b   c
1 11 foo
2 12 bar
3 13 baz
4 14 bar
5 15 foo

Do you have a tip or a function to import this easily into an R session, without having to type the entire data.frame() instruction ?

Thanks in advance for any hint !

PS : sorry if the term "query" is not really nice in my question title, but it seems you can't use the word "question" in a question title in Stack overflow :-)

juba
  • 47,631
  • 14
  • 113
  • 118

4 Answers4

24

Maybe textConnection() is what you want here:

R> zz <- read.table(textConnection("a  b   c
1 11 foo
2 12 bar
3 13 baz
4 14 bar
5 15 foo"), header=TRUE)
R> zz
  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo
R> 

It allows you to treat the text as a "connection" from which to read. You can also just copy and paste, but access from the clipboard is more dependent on the operating system and hence less portable.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • 5
    Ah, yes, that's great, thanks ! Access from the clipboard works nicely on my system with `zz <- read.table(file(description="clipboard"), header=TRUE)` – juba Jun 01 '12 at 11:29
  • 7
    @juba, `read.table` also offers a `text` argument, so that creating the connection isn't necessary. `read.table(text = "a b c...` should work, too. Nice with the clipboard! – BenBarnes Jun 01 '12 at 12:05
  • 2
    @juba Note that the clipboard afaik doesn't work on all systems. I believe (but correct me if I'm wrong) that it is Windows specific. – Joris Meys Jun 01 '12 at 12:21
  • 1
    It works well on my linux system. There is a "clipboard" section in the `connections` man page, with detailed informations : if I understand correctly, reading and writing to the clipboard works under windows, reading works out of the box under linux, whereas writing on linux and reading/writing on Mac OSX requires specific calls to `pipe()`. – juba Jun 01 '12 at 13:05
  • 3
    For the Mac (where `?clipboard` takes you to the connections help page): "Mac OS X users can use pipe("pbpaste") and pipe("pbcopy", "w") to read from and write to that system's clipboard." Effective use would be: `zz<- read.table(file=pipe("pbpaste"), header=TRUE)` – IRTFM Jun 01 '12 at 13:08
24

Recent version of R now offer an even lower keystroke option than the textConnection route for entry of columnar data into read.table and friends. faced with this:

zz
  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo

One can simply insert : <- read.table(text=" after the zz, delete the carriage-return and then insert ", header=TRUE) after the last foo and type [enter].

zz<- read.table(text="  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo", header=TRUE)

One can also use scan to efficiently enter long sequences of pure numbers or pure character vector entries. Faced with: 67 75 44 25 99 37 6 96 77 21 31 41 5 52 13 46 14 70 100 18 , one can simply type: zz <- scan() and hit [enter]. Then paste the selected numbers and hit [enter] again and perhaps a second time to cause a double carriage-return and the console should respond "read 20 items".

> zz <- scan()
1: 67  75  44  25  99  37   6  96  77  21  31  41   5  52  13  46  14  70 100  18
21: 
Read 20 items

The "character" task. after pasting to console and editing out extraneous line-feeds and adding quotes, then hitting [enter]:

> countries <- scan(what="character")
1:     'republic of congo'
2:     'republic of the congo'
3:     'congo, republic of the'
4:     'congo, republic'
5: 'democratic republic of the congo'
6: 'congo, democratic republic of the'
7: 'dem rep of the congo'
8: 
Read 7 items
IRTFM
  • 258,963
  • 21
  • 364
  • 487
14

You can also ask the questioner to use the dput function which dumps any data structure in a way that can be just copy-pasted into R. e.g.

> zz
  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo

> dput(zz)
structure(list(a = 1:5, b = 11:15, c = structure(c(3L, 1L, 2L, 
1L, 3L), .Label = c("bar", "baz", "foo"), class = "factor")), .Names = c("a", 
"b", "c"), class = "data.frame", row.names = c(NA, -5L))

> xx <- structure(list(a = 1:5, b = 11:15, c = structure(c(3L, 1L, 2L, 
+ 1L, 3L), .Label = c("bar", "baz", "foo"), class = "factor")), .Names = c("a", 
+ "b", "c"), class = "data.frame", row.names = c(NA, -5L))
> xx
  a  b   c
1 1 11 foo
2 2 12 bar
3 3 13 baz
4 4 14 bar
5 5 15 foo
huon
  • 94,605
  • 21
  • 231
  • 225
5

Just want to add this because I now use it regularly and I think it's quite useful. There is a package overflow (install instructions below) that has a function to read copied data frames. Say I begin with an SO post that contains the data shown as the following, but with no dput output.

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Now if I copy that data directly, and then run the following

library(overflow)
soread()
# data.frame “mydf” created in your workspace
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

I now have a data frame named mydf identical to the one I copied in my global environment, so I don't have to wait for the OP to post a dput of their data frame. I can change the name of the data frame with the out argument, which (obviously) defaults to mydf. There are also a few other useful functions for working with SO posts in the package (like sopkgs(), which installs a package temporarily so you can help with a question about a package that you have not previously installed).

If you leave library(overflow) in your .Rprofile, then soread() makes pretty quick work of importing data from SO posts.

overflow is available from GitHub, and can be installed with

library(devtools)
install_github("overflow", "sebastian-c")
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245