0

I have the following working code:

############################################
###Read in all the wac gzip files###########
###Add some calculated fields ###########
############################################
library(readr)
setwd("N:/Dropbox/_BonesFirst/65_GIS_Raw/LODES/")
directory<-("N:/Dropbox/_BonesFirst/65_GIS_Raw/LODES/")
to.readin <- as.list(list.files(pattern="2002.csv"))

LEHD2002<-lapply(to.readin, function(x) {
  read.table(gzfile(x), header = TRUE, sep = ",", colClasses = "numeric", stringsAsFactors = FALSE)
})

But I would like to load the things from lapply into the global environment, for debugging reasons.

This provides a way to do so.

# Load data sets
  lapply(filenames, load, .GlobalEnv)

But when I attempt to use it, I get the following error:

Error in FUN(X[[i]], ...) : bad restore file magic number (file may be corrupted) -- no data loaded In addition: Warning message: file ‘az_wac_S000_JT00_2004.csv.gz’ has magic number 'w_geo' Use of save versions prior to 2 is deprecated

Am I doing something wrong, or is 'load' deprecated or the like?

The gzfile(x) converts the .gz (zipped) file to a .csv so that shouldn't be an issue...

Mox
  • 511
  • 5
  • 15
  • This is extremely bad practice. You shouldn't be doing this. – thc Mar 24 '18 at 00:22
  • @thc What's bad practice? He's not doing anything that's bad practice. – De Novo Mar 24 '18 at 00:27
  • You shouldn't be loading data into the global environment from an `lapply` loop. `lapply` provides it's own environment so you can be sure you don't accidentally modify things outside its scope. Not only that, but he's reading in a text file and trying to apply `load` to it, which is used for reading in R formatted data. – thc Mar 24 '18 at 00:38
  • @thc You always load things into the global environment. It's what all the reading functions do by default. I think people are just scared when they see global environment and automatically think it's bad practice. If you're using `load`, and you want to be extra careful, you can follow the examples to load it into another environment and then attach instead, but the default for load is to put it in the global environment. There is nothing bad practice about that. It's what everyone does every day. – De Novo Mar 24 '18 at 00:47
  • @thc now assigning global variables inside `lapply` is not good practice. But if the purpose of a call is to load an object from outside R into R, and then do stuff with it throughout your R session, you're going to want that thing in your global environment. That's why all the functions that load things put them in your global environment by default. – De Novo Mar 24 '18 at 00:56
  • @Mox: why did you `load(readr)` then use `read.table(gzfile(x), ...`? – Tung Mar 24 '18 at 01:35
  • @DanHall Regardless of what "people do", one could argue that using `load` at all is bad practice, as it unnecessarily goes against the functional programming paradigm, and has all the problems associated with it. But the main point was, as you said, using `load` from within `lapply` is bad practice. Anyway, that's just my opinion. – thc Mar 24 '18 at 18:14
  • @Tung: I don't, the bit of code with the load function is a copy-paste example of loading into the global environment. – Mox Mar 29 '18 at 21:08

2 Answers2

2

load loads files in a binary format (e.g., .rda files). You're loading in files in a textual format, .csv files. This is why you're using read.table. When you try to read textual format files using load, you will get that error.

The usage: lapply(filenames, load, .GlobalEnv), passes .GlobalEnv to load, not to lapply. This is just a different way of reading in a list of files that are in a different format than yours. load can put the objects in a different environment as a way to protect you from overwriting objects in your current environment with the same name as the objects you're loading. Binary objects created using save (which you can load in with load) carry their names with them. When you load them in, you do not assign them to a name. They are accessible in the environment you choose to load them into with their original name.

Both methods load the objects into .GlobalEnv. So your code works the way you want it to. You can tell that your objects have not somehow been read into a different environment by trying to access them after you run the code. If you can access them using the object you named them with

De Novo
  • 7,120
  • 1
  • 23
  • 39
  • List access methods are also key. I had not previously known that LEHD2002[[1]][59] would access the 59th column in the first dataframe in the list. – Mox Mar 29 '18 at 21:10
0

Quick and dirty way is to load it into the global environment, with <<- rather than <-

LEHD2002<<-lapply(to.readin, function(x)
LEHD2002<-lapply(to.readin, function(x)

attach() can also be used; but is touchier, and attaching multiple files makes a mess. (ie, make sure you detach() any files you attach().

Mox
  • 511
  • 5
  • 15