Understanding scoping of nested functions

Question

I'm attempting to refactor a script by splitting it into multiple functions, having a main function and "help functions". Here I stumbled upon a problem which can be reduced to the following example:

g <- function(a,b){       # help function
    a^2 + b^2 - c
}

f <- function(a,b,c,d){   # main function
    g(a,b)
}

The problem here is, that f cannot be computed because g does not know what c is, but why is that the case?

I've read here that if R doesn't know a variable/argument in a function, it searches for that missing variable/argument in the outer function/environment.

Why does g still not know the value c which gets declared by f?

The example and your question are both clear, but I argue that even starting down this road is indicative of bad programming practice: assuming that `g()` can find `c` still renders `g()` a not-to-reproducible function. That is, its output is not strictly a definition of the two parameters passed to it. If you are writing these functions, then write them correctly: ***never*** assume a variable exists outside of the function's definition, or you will have at times very-difficult-to-troubleshoot bugs that perform differently with same args in different environments. — r2evans, Jul 21 '21 at 17:24

G. Grothendieck · Accepted Answer · 2022-11-14T16:25:51.670

R uses lexical scoping which means that if a function needs to reference an object not defined in that function it looks at the environment in which the function was defined, not the caller. In the question g is defined in the global environment so that is where g looks for c.

Also note that in R we would not call the functions in the question nested functions. Rather what is nested is the calls, not the functions. In (3) below we show nested functions.

1) We can reset a function's environment in which case it will think it was defined in that environment.

g <- function(a,b){       # help function
    a^2 + b^2 - c
}

f <- function(a,b,c,d){   # main function
    environment(g) <- environment()
    g(a,b)
}

f(1, 2, 3, 4)
## [1] 2

2) Another possibility is to explicitly tell it which environment to search using envir$c (where envir is the desired environment) or get("c", envir) or with(envir, c) . envir$c will look into envir. The other two will look there and if not found will look into ancestor environments. (Each environment has a parent or the emptyenv(). This is distinct from the call stack.)

g <- function(a, b, envir = parent.frame()){       # help function
    a^2 + b^2 - envir$c
}

f <- function(a,b,c,d){   # main function
    g(a,b)
}

f(1, 2, 3, 4)
## [1] 2

3) We can nest the functions so that g is defined in f.

f <- function(a,b,c,d){   # main function
    g <- function(a,b){       # help function
        a^2 + b^2 - c
    }
    g(a,b)
}

f(1, 2, 3, 4)
## [1] 2

4) Of course you could just pass c and avoid all these problems.

g <- function(a, b, c) { 
    a^2 + b^2 - c
}

f <- function(a, b, c, d) {   # main function
    g(a, b, c)
}

f(1, 2, 3, 4)
## [1] 2

The benefits of #4 are huge. It's much easier to test and reason about your code when you write pure functions. — Bill O'Brien, Jul 21 '21 at 17:50
A helpful post, but could you give an example of #4 -- explicitly passing the object — Mark R, Nov 14 '22 at 16:16

Understanding scoping of nested functions

1 Answers1