R predict() function returning wrong/too many values

Question

I am trying to convert Absorbance (Abs) values to Concentration (ng/mL), based on an established linear model & standard curve. I planned to do this by using the predict() function. I am having trouble getting predict() to return the desired results. Here is a sample of my code:

Standards<-data.frame(ng_mL=c(0,0.4,1,4),
                      Abs550nm=c(1.7535,1.5896,1.4285,0.9362))
LM.2<-lm(log(Standards[['Abs550nm']])~Standards[['ng_mL']])
Abs<-c(1.7812,1.7309,1.3537,1.6757,1.7409,1.7875,1.7533,1.8169,1.753,1.6721,1.7036,1.6707,
       0.3903,0.3362,0.2886,0.281,0.3596,0.4122,0.218,0.2331,1.3292,1.2734)


predict(object=LM.2,
        newdata=data.frame(Concentration=Abs[1]))#using Abs[1] as an example, but I eventually want predictions for all values in Abs

Running that last lines gives this output:

> predict(object=LM.2,
+         newdata=data.frame(Concentration=Abs[1]))
         1          2          3          4 
 0.5338437  0.4731341  0.3820697 -0.0732525 
Warning message:
'newdata' had 1 row but variables found have 4 rows

This does not seem to be the output I want. I am trying to get a single predicted Concentration value for each Absorbance (Abs) entry. It would be nice to be able to predict all of the entries at once and add them to an existing data frame, but I can't even get it to give me a single value correctly. I've read many threads on here, webpages found on Google, and all of the help files, and for the life of me I cannot understand what is going on with this function. Any help would be appreciated, thanks.

score 17 · Accepted Answer · answered Sep 30 '14 at 21:29

You must have a variable in newdata that has the same name as that used in the model formula used to fit the model initially.

You have two errors:

You don't use a variable in newdata with the same name as the covariate used to fit the model, and
You make the problem much more difficult to resolve because you abuse the formula interface.

Don't fit your model like this:

mod <- lm(log(Standards[['Abs550nm']])~Standards[['ng_mL']])

fit your model like this

mod <- lm(log(Abs550nm) ~ ng_mL, data = standards)

Isn't that some much more readable?

To predict you would need a data frame with a variable ng_mL:

predict(mod, newdata = data.frame(ng_mL = c(0.5, 1.2)))

Now you may have a third error. You appear to be trying to predict with new values of Absorbance, but the way you fitted the model, Absorbance is the response variable. You would need to supply new values for ng_mL.

The behaviour you are seeing is what happens when R can't find a correctly-named variable in newdata; it returns the fitted values from the model or the predictions at the observed data.

This makes me think you have the formula back to front. Did you mean:

mod2 <- lm(ng_mL ~ log(Abs550nm), data = standards)

?? In which case, you'd need

predict(mod2, newdata = data.frame(Abs550nm = c(1.7812,1.7309)))

say. Note you don't need to include the log() bit in the name. R recognises that as a function and applies to the variable Abs550nm for you.

If the model really is log(Abs550nm) ~ ng_mL and you want to find values of ng_mL for new values of Abs550nm you'll need to invert the fitted model in some way.

Do I have the model backwards? I am not sure now... When I generate the standard curve, I use known Concentrations (ng/mL), and measure the Absorbance (Abs). So, that would make the Concentrations the _terms_ factor, and the Abs would be the _response_ factor, right? So, Abs ~ Conc. When I run the experiment, I measure the Absorbance of samples in order to find the Concentration. Does this mean that I have to switch the formula? In the past, I've just plugged the Abs into the forumula y=mx+b based on the linear model. This time, I wanted to try predict() — user3658874, Sep 30 '14 at 21:56
Well, as far as R is concerned, the formula is `response ~ predictor` and hence `predict()` will give you new values of `response` for stated values of `predictor` given the model. In your case of y=mx+b, here y is `log(Abs550nm)`, `x` is `ng_mL` given the formula you used. If that really is the model then like I said, you need to invert it to get the equation in terms of `x`, not `y`, and R doesn't do the inversion as that is non standard. Note you get m and b from your equation via `coef(mod)`. — Gavin Simpson, Oct 01 '14 at 15:14

R predict() function returning wrong/too many values

1 Answers1

Linked

Related