Reading multiple .dat files as a list and saving as .RDATA files in R

Question

I want to import multiple .DAT files from a directory and make them as a list elements and then save them as .RDATA files.

I tried the following code

files <- dir(pattern = "*.DAT")
library(tidyverse)
Data1 <- 
  files %>%
    map(~ read.table(file = ., fill = TRUE))

which works sometimes and fails others. The files are also available on this link. I want to read all files and them save them as .RDATA with the same names.

Have you tried `safely` for error handling? See [the docs](https://purrr.tidyverse.org/reference/safely.html). — Rich Pauloo, Apr 15 '19 at 15:31
Which specific files fail? What is the error that you get? Did you open the files that failed in an editor to see if they have the structure that you expect? — Jan van der Laan, Apr 18 '19 at 09:28
I think a more important incentive for getting answers than a bounty is to provide more specific information about what "works sometimes and fails others." Particularly since you're pointing us to a page with 510 files of unknown size we'd need to download in order to see those details ourselves without a clear description in the question. Reading the files fails? Or writing them? — camille, Apr 18 '19 at 13:39

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

Since the data of the link are partly a little bit unclean, I show you the solution of the core problem of your question on the basis of this example data:

(name1 <- name2 <- name3 <- name4 <- name5 <- data.frame(matrix(1:12, 3, 4)))
#   X1 X2 X3 X4
# 1  1  4  7 10
# 2  2  5  8 11
# 3  3  6  9 12

We save the data into a sub directory of your working directory named "test".

l <- mget(ls(pattern="^name"))
DIR <- "test"
# dir.create(DIR)  # leave out if dir already exists
sapply(1:length(l), function(x) 
  write.table(l[[x]], file=paste0(DIR, "/", names(l)[x], ".dat"), row.names=FALSE))

Now we look what's inside "test".

dir(DIR)
# [1] "name1.dat" "name2.dat" "name3.dat" "name4.dat" "name5.dat"

Now we import the files in the directory by pattern. I use rio::import_list, which nicely imports the files into a list an uses data.table::fread inside. But your own code also would work fine.

# rm(list=ls())  # commented out for user safety
L <- rio::import_list(paste0(DIR, "/", dir(DIR, pattern="\\.dat$")), format="tsv")

To save them as .Rdata we want to assign names dynamically which we achive with the list option within save().

sapply(seq_along(L), function(x) {
  tmp <- L[[x]]
  assign(names(L)[x], tmp)
  save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})

When we list the directory we see that the data was created.

dir(DIR)
# [1] "name1.dat"   "name1.Rdata" "name2.dat"   "name2.Rdata" "name3.dat"   "name3.Rdata"
# [7] "name4.dat"   "name4.Rdata" "name5.dat"   "name5.Rdata"

Now let's look whether the object names also were created correctly:

# rm(list=ls())  # commented out for user safety
load("test/name1.Rdata")
ls()
# [1] "name1"
name1
#   X1 X2 X3 X4
# 1  1  4  7 10
# 2  2  5  8 11
# 3  3  6  9 12

Which is the case.

Add-on option

We alternatively could attempt a more direct approach using rvest. First we fetch the data names:

library(rvest)
dat.names <- html_attr(html_nodes(read_html(
  "https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/Hand.html"),
  "a"), "href")

and create individual links:

links <- as.character(sapply(dat.names, function(x)
  paste0("https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/", x)))

The remainder is basically the same as above:

DIR <- "test"
# dir.create(DIR)  # leave out if dir already exists

library(rio)
system.time(L <- import_list(links, format="tsv") ) # this will take a minute
sapply(seq_along(L), function(x) {
  tmp <- L[[x]]
  assign(names(L)[x], tmp)
  save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})

# rm(list=ls())  # commented out for user safety
load("test/clinical.Rdata")  # test a data set
clinical
#    V1  V2  V3
# 1  26  31  57
# 2  51  59 110
# 3  21  11  32
# 4  40  34  74
# 5 138 135 273

However, as noted earlier in the introduction, the data are partly a little bit unclean and you probably will have to handle them individually and adapt the code case-wise.

The code `dat.names <- html_attr(html_nodes(read_html( "https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/Hand.html"), "a"), "href")` throws the following error: `Error in open.connection(x, "rb") : server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none`. Any thoughts, please. — MYaseen208, Apr 20 '19 at 06:57
I can't reproduce that issue on a Windows machine. You might want to try Linux specific suggestions of [this gitlab thread](https://gitlab.com/gitlab-org/gitlab-ce/issues/38292) and report back if it had helped. — jay.sf, Apr 20 '19 at 07:06

score 0 · Answer 2 · answered Apr 18 '19 at 17:23

This should get you close. It reads all the .dat files from your directory and saves them as .RData files in your directory with the appropriate names. One downside is that when you open them in R they retain the "temp.file" name, so you have to rename them manually or just open them one at a time. Not sure how to get around that.

file.list <- lapply(1:length(dir()), function(x) read.delim(dir()[x], header=FALSE))
names.list <- lapply(1:length(dir()), function(x) gsub(".dat", "", dir()[x]))

for(i in 1:length(file.list)){
  temp.file <- file.list[[i]]
  temp.name <- paste(names.list[[i]], ".RData", sep="")
  save(temp.file, file=temp.name)
}

Reading multiple .dat files as a list and saving as .RDATA files in R

2 Answers2

Add-on option