2

In R, is there a function or other way to make a list of all variables that have been created in the global environment after a certain point in the script? I am using an R notebook so it has chunks of code, and the goal is to eventually delete all variables that were made in certain chunks. The first part of the script has many variables (takes a long time to reread) that I would like to keep but then delete all the variables created in the second part of the script. I know I can just clear the environment etc. but for certain reasons I can't do this. I also have too many variables to selectively type the ones I want to rm(). The variables are all different A (pseudo) example of what I want to do...

x <- 1
y <- 2 
df <- data.frame()
rr <- raster()

## Function here to iteratively list all variables created after this line of code##
dd <- data.frame()
z <- c(1,2,3)

rm(listofvars) #contains "dd" and "zz" only

Alternatively, is there a way to list all variables in the global environment in the order that they were created?

I hope this makes sense. Any help is appreciated.

slamballais
  • 3,161
  • 3
  • 18
  • 29
Rachel
  • 109
  • 8
  • 1
    There are likely ways you can use R's internal tools to parse the script, find object names, and try to keep track of the where cumulatively they were created ... but that is certainly not trivial. What you can likely do is to have variable-name "checkpoints": start your script with `known_vars <- list()` (intentionally empty), and then periodically store new variables: `known_vars <- c(known_vars, list(setdiff(unlist(known_vars), ls())))`. This `known_vars` variable will be a list with `n` elements, each being the "new" variables found at your various checkpoints. It's a hack. – r2evans May 16 '21 at 01:47
  • 1
    You could put your code chunks in functions and call those. Then the function variables are never created in the global environment. – SteveM May 16 '21 at 01:52
  • FYI, the args in the untested-code in my comment are out of order ... answer pending ... – r2evans May 16 '21 at 02:38

2 Answers2

4

I don't think it's a great idea to get into the business of parsing your script and determine variable definition order in that manner. Here's an alternative: set "known variables" checkpoints.

Imagine this is your notebook, first code block:

.known_vars <- list()

### first code block
# some code here
a <- 1
bb <- 2
# more code

.known_vars <- c(.known_vars, list(setdiff(ls(), unlist(.known_vars))))

End each of your code-blocks (or even more frequently, it's entirely up to you) with that last part, which appends a list of variables not known in the previous code block(s).

Next:

### second code block
# some code here
a <- 2 # over-write, not new
quux1 <- quux2 <- 9
# more code

.known_vars <- c(.known_vars, list(setdiff(ls(), unlist(.known_vars))))

Again, that last line is the same as before. Just use that same line of code.

When you want to do some cleanup, then

.known_vars
# [[1]]
# [1] "a"  "bb"
# [[2]]
# [1] "quux1" "quux2"

In this case, if we want to remove all variable except those in the first code block, then we'd do

unlist(.known_vars[-1])
# [1] "quux1" "quux2"
rm(list = unlist(.known_vars[-1]))

The reason I chose a dot-leading variable name is that by default it is not shown in ls() output: you'd need ls(all.names=TRUE) to see it as well. While not a problem, I just want to keep things a little cleaner. If you choose to not start with a dot, and for some reason choose to delete variables from the same code block in which known_vars is defined, the you might lose the checkpoints for other blocks, too.

If you want this a little more formal, then you can do

.vars <- local({
  .cache <- list()
  function(add = NULL, clear = FALSE) {
    if (clear) .cache <<- list()
    if (length(add)) .cache <<- c(.cache, list(setdiff(add, unlist(.cache))))
    if (is.null(add)) .cache else invisible(.cache)
  }
})

Where calling it with nothing gets its current stage, and calling with ls() will make a new entry. Such as:

ls() # proof we're starting "empty"
# character(0)

.vars(clear = TRUE) # instantiate with an empty list of variables

### first code block
# some code here
a <- 1
bb <- 2
# more code
.vars(ls())

### second code block
# some code here
a <- 2 # over-write, not new
quux1 <- quux2 <- 9
# more code
.vars(ls())

.vars()
# [[1]]
# [1] "a"  "bb"
# [[2]]
# [1] "quux1" "quux2"

And removing unwanted variables is done in the same way.

Since this is still just an object in the global environment, the next best way to keep this protected (and perhaps as a not-leading-dot object name), would be to make sure it is in its own environment (not .GlobalEnv) and still in R's search path. This is likely easily done with a custom package, though that may be more work than you were expecting for this simple utility.

BTW: R does not store when an object is created, modified, or deleted, so you'd need to keep track of that, too. If you feel the need to add timestamps to .vars(), then you'll need to restructure things a bit ... again, perhaps more effort than needed here.

BTW 2: this is prone to deleted-then-redefined variables: it does not know if vars have been deleted, just that they were defined at some time. If anything else removes variables, this won't know, and then rm(list=...) will complain about missing variables. Not horrible, but still good to know.

r2evans
  • 141,215
  • 6
  • 77
  • 149
0

Using the script created in the Note at the end, read it in using readLines, then grep out those lines that start with optional space, a word, more optional space and <- . Then remove the <- and everything thereafter and trim off whitespace leaving the variable names v in the order encountered in the script. Next as an example form vv as a subvector of v containing "df" and the following variable names.

L <- grep("^\\s*\\S*\\s*<-", readLines("myscript.R"), value = TRUE)
v <- trimws(sub("<-.*", "", L)); v
## [1] "x"  "y"  "df" "rr" "dd" "z" 

vv <- tail(v, -(match("df", v)-1)); vv  
## [1] "df" "rr" "dd" "z" 

To remove variables in vv from global environment use rm(list = vv, .GlobalEnv) .

Note

Lines <- "
x <- 1
y <- 2 
df <- data.frame()
rr <- raster()

## Function here to iteratively list all variables created after this line of code##
dd <- data.frame()
z <- c(1,2,3)
"
cat(Lines, file = "myscript.R")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341