4

In R, you can fit GAM models from the mgcv package using a formula which contains transformations such as log or sqrt and by default the model.frame is returned (only the variables specified in the formula with transformations applied).

Is there any way I can recover the untransformed data.frame?

Example:

reg <- mgcv::gam(log(mpg) ~ disp + I(hp^2), data=mtcars)

returns

> head(reg$model,3) log(mpg) disp I(hp^2) Mazda RX4 3.044522 160 12100 Mazda RX4 Wag 3.044522 160 12100 Datsun 710 3.126761 108 8649

But, I want to get this untransformed dataset from the model's model.frame

mpg disp hp Mazda RX4 21.0 160 110 Mazda RX4 Wag 21.0 160 110 Datsun 710 22.8 108 93

Some Background: The newdata argument for most model's predict() function requires untransformed data, so I cannot feed the model.frame back into the predict() function. I am already aware that the omitting the newdata argument will return fitted values. My requirement is that the model object gives me back the raw data.

Steven M. Mortimer
  • 1,618
  • 14
  • 36
  • Is there an easy way to programmatically reverse every transform? I don't want to write custom code to parse the column names and then apply certain functions. – Steven M. Mortimer Mar 18 '17 at 18:48
  • 1
    Transformations aren't necessarily invertible. In your example you use `hp^2`, which isn't invertible because it loses the sign of `hp`. The smooth transformations used in `mgcv` are certainly not invertible - they make it very possible to map two different input values to the same output. The only practical way to do this is to keep the data, as in Zheyuan's answer. – Gregor Thomas Mar 18 '17 at 19:41
  • Did you ever find a solution to this issue? I could use a solution to this problem myself – duckmayr Jul 22 '20 at 22:40

3 Answers3

5

Here is one way: use glm instead of lm, even for Gaussian data. glm returns much more stuff than lm, including the raw data frame.


Well, if you are asking mgcv questions, you'd better provide a mgcv example.

mgcv has a consistent standard with glm. Have a read on ?gamObject for a full list of what gam can return. You will see that it can return data, if you set keepData via control argument of gam. When you call gam, add the following

control = gam.control(keepData = TRUE)

Here is a simple, reproducible example:

dat <- data.frame(x = runif(50), y = rnorm(50))
library(mgcv)
fit <- gam(y ~ s(x, bs = 'cr', k = 5), data = dat, control = gam.control(keepData = TRUE))
head(fit$model)  # model frame
head(fit$data)  # original data
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
3

We can extract the vars from the 'terms' and use it to subset the original dataset

head(mtcars[all.vars(reg$terms)], 3)
#               mpg disp  hp
#Mazda RX4     21.0  160 110
#Mazda RX4 Wag 21.0  160 110
#Datsun 710    22.8  108  93

Or with call

v1 <- all.vars(reg$call)
head(get(tail(v1, 1))[head(v1, -1)], 3)
#               mpg disp  hp
#Mazda RX4     21.0  160 110
#Mazda RX4 Wag 21.0  160 110
#Datsun 710    22.8  108  93
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    You can see in my post that my requirement is that the model object gives me back the raw data. You cannot assume that the dataset has been loaded or even exists in the global environment. – Steven M. Mortimer Mar 18 '17 at 18:51
  • @StevenMortimer I am extracting the terms only from the model. Otherwise, you have to do the transformation based on the model object as the original dataset is not present – akrun Mar 18 '17 at 18:51
  • I understand that the terms come from the model object in your example, but I need the raw, untransformed data from the model object. I assume from the `model.frame` since that is the only thing resembling the raw data. – Steven M. Mortimer Mar 18 '17 at 18:59
  • @StevenMortimer If there is no orginal object, the model object doesn't have the orignal dataset (as per the model showed in the example). if the original object is available, the second option (updated) works too – akrun Mar 18 '17 at 19:02
  • @StevenMortimer Perhaps `eval(getCall(reg)$data, environment(formula(reg)))[all.vars(reg$terms)]` if working in the same env. Otherwise, if the model object don't have any info to get the original data, it may not be easy – akrun Mar 18 '17 at 19:17
1

EDIT: Based on comments by @李哲源 Zheyuan Li

The following approach depends on original data to be present in the current workspace or in the search path. If we remove original data before updating the model, it will produce an error.

Error in is.data.frame(data) : object 'dat' not found

dat <- mtcars
reg <- lm(log(mpg) ~ disp + I(hp^2), data=dat)
head(reg$model,3)
#               log(mpg) disp I(hp^2)
# Mazda RX4     3.044522  160   12100
# Mazda RX4 Wag 3.044522  160   12100
# Datsun 710    3.126761  108    8649

# rm( dat )  ## uncomment this line and see error appears after update
reg <- update(reg, mpg ~ disp + hp, method = 'model.frame' )
head(reg)
#                    mpg disp  hp
# Mazda RX4         21.0  160 110
# Mazda RX4 Wag     21.0  160 110
# Datsun 710        22.8  108  93
# Hornet 4 Drive    21.4  258 110
# Hornet Sportabout 18.7  360 175
# Valiant           18.1  225 105
Sathish
  • 12,453
  • 3
  • 41
  • 59
  • This answer does not address the question. The question shows that you can retrieve the `model.frame` from the object, but you cannot get back the untransformed data in an easy way. – Steven M. Mortimer Mar 18 '17 at 19:00