I don't think it's a great idea to get into the business of parsing your script and determine variable definition order in that manner. Here's an alternative: set "known variables" checkpoints.
Imagine this is your notebook, first code block:
.known_vars <- list()
### first code block
# some code here
a <- 1
bb <- 2
# more code
.known_vars <- c(.known_vars, list(setdiff(ls(), unlist(.known_vars))))
End each of your code-blocks (or even more frequently, it's entirely up to you) with that last part, which appends a list of variables not known in the previous code block(s).
Next:
### second code block
# some code here
a <- 2 # over-write, not new
quux1 <- quux2 <- 9
# more code
.known_vars <- c(.known_vars, list(setdiff(ls(), unlist(.known_vars))))
Again, that last line is the same as before. Just use that same line of code.
When you want to do some cleanup, then
.known_vars
# [[1]]
# [1] "a" "bb"
# [[2]]
# [1] "quux1" "quux2"
In this case, if we want to remove all variable except those in the first code block, then we'd do
unlist(.known_vars[-1])
# [1] "quux1" "quux2"
rm(list = unlist(.known_vars[-1]))
The reason I chose a dot-leading variable name is that by default it is not shown in ls()
output: you'd need ls(all.names=TRUE)
to see it as well. While not a problem, I just want to keep things a little cleaner. If you choose to not start with a dot, and for some reason choose to delete variables from the same code block in which known_vars
is defined, the you might lose the checkpoints for other blocks, too.
If you want this a little more formal, then you can do
.vars <- local({
.cache <- list()
function(add = NULL, clear = FALSE) {
if (clear) .cache <<- list()
if (length(add)) .cache <<- c(.cache, list(setdiff(add, unlist(.cache))))
if (is.null(add)) .cache else invisible(.cache)
}
})
Where calling it with nothing gets its current stage, and calling with ls()
will make a new entry. Such as:
ls() # proof we're starting "empty"
# character(0)
.vars(clear = TRUE) # instantiate with an empty list of variables
### first code block
# some code here
a <- 1
bb <- 2
# more code
.vars(ls())
### second code block
# some code here
a <- 2 # over-write, not new
quux1 <- quux2 <- 9
# more code
.vars(ls())
.vars()
# [[1]]
# [1] "a" "bb"
# [[2]]
# [1] "quux1" "quux2"
And removing unwanted variables is done in the same way.
Since this is still just an object in the global environment, the next best way to keep this protected (and perhaps as a not-leading-dot object name), would be to make sure it is in its own environment (not .GlobalEnv
) and still in R's search path. This is likely easily done with a custom package, though that may be more work than you were expecting for this simple utility.
BTW: R does not store when an object is created, modified, or deleted, so you'd need to keep track of that, too. If you feel the need to add timestamps to .vars()
, then you'll need to restructure things a bit ... again, perhaps more effort than needed here.
BTW 2: this is prone to deleted-then-redefined variables: it does not know if vars have been deleted, just that they were defined at some time. If anything else removes variables, this won't know, and then rm(list=...)
will complain about missing variables. Not horrible, but still good to know.