2

I have a piece of code that that is using the nnet package and I am interested in calculating a number of different neural network models & then saving all the models to disk (with save() ).

The issue that I am running into is that the "terms" elements in the neural network has an attribute ".Environment" that ends up being hundreds of megabytes whereas the rest of the model is only a few kilobytes. (once the fitted values & residuals are deleted)

Further, deleting the ".Environment" attribute doesn't appear to cause a problem in terms of using the model with 'predict'.

Does anyone have any idea what either R or nnet is doing with this attribute? Has anyone seen anything like this?

chuck taylor
  • 2,476
  • 5
  • 29
  • 46
  • I have the same issue with model objects from pscl::hurdle. Can confirm deleting the .Environment attribute does not affect the hurdle predict method. – Jeff Keller Dec 29 '15 at 20:55

1 Answers1

1

tl;dr: this is OK, except for some very special cases

Background

The .Environment attribute in R contains a reference to the context in which an R closure (usually a formula or a function) was defined. An R environment is a store holding values of variables, similarly to a list. This allows the formula to refer to these variables, for example:

> f = function(g) return(y ~ g(x))
> form = f(exp)
> lm(form, list(y=1:10, x=log(1:10)))
...
Coefficients:
(Intercept)     g(x)
3.37e-15        1.00e+00

In this example, the formula form if defined as y~exp(x), by giving g the value of exp. In order to be able to find the value of g (which is an argument to function f), the formula needs to hold a reference to the environment constructed inside the call to function f.

You can see the enviroment attached to a formula by using the attributes() or environment() functions as follows:

> attributes(form)
$class
[1] "formula"

$.Environment
<environment: R_GlobalEnv>

> environment(form)
<environment: R_GlobalEnv>

Your question

I believe you are using the nnet() function variant with a formula (rather than matrices), i.e.

> nnet(y ~ x1 + x2, ...)

Unfortunately, R keeps the entire environment (including all the variables defined where your formula is defined) allocated, even if your formula does not refer to any of it. There is no way to the language to easily tell what you may or may not be using from the environment.

One solution is to explicitly retain only the required parts of the environment. In particular, if your formula does not refer to anything in the environment (which is the most common case), it is safe to remove it.

I would suggest removing the environment from your formula before you call nnet, something like this:

    form = y~x + z
    environment(form) = NULL
    ...
    result = nnet(form, ...)
Jerzy
  • 670
  • 6
  • 12
  • I was in fact passing a formula into nnet and not a matrix. I didn't realize that this was going to deep copy the entire calling environment. Way to zombie a thread btw :) – chuck taylor Mar 29 '16 at 18:43