Hi I have a huge dataframe (df) whose names are different Tenors, in the columns I have values for each tenor. For the last two tenors I have some missing rows which I want to complete based on the given data for the non-missing rows. My dataframe looks like this:
1095 1825 2555 3650 5475 7300 10950
0.00116034 0.00170552 0.00274189 0.00472176 0.00697495 NA NA
0.00112157 0.00188056 0.00295159 0.0050669 0.00728063 0.00816778 0.00842034
0.00138009 0.00225073 0.00339548 0.00549386 0.00780401 0.00871812 0.00897222
I am stuck in using predict() and lm. I want to obtain those missing values. Sorry for this basic question, but I am in a hurry, and I have been stuck for over an hour.
Thanks in advance.
EDIT I want to create a linear model with a data frame, lets say df2
df2 <-df[rowSums(is.na(df)) > 0,])
And use predict to find the missing values for 7300, 10950.
EDIT2:
Thanks to @Zheyuan Li I've gone through some progress, but I can't get my predicted data, I have tried to use two options:
b<-setNames(stack(df2),c("value", "Tenor"))
data.lm <- lm(value~Tenor, data = b, na.action = na.exclude)
pred<-predict(data.lm)
If I execute this code, I get the pred with the same values as b.
In the other hand, if I use the following code, I obtain the same values for all predicted values.
aov <- aov(data.lm,data=b)
pred<-predict(aov)
EDIT3:
I have adapted my code and removed the last column to make things easier. Now I have the following data:
1095 1825 2555 3650 5475 7300
0.00116034 0.00170552 0.00274189 0.00472176 0.00697495 NA
0.00112157 0.00188056 0.00295159 0.0050669 0.00728063 0.00816778
0.00138009 0.00225073 0.00339548 0.00549386 0.00780401 0.00871812
My new code looks like this:
setDT(df)
variables<-setdiff(names(df),c('7300',"DATE"))
y_var<-"7300"
Line<-function (train_dat, test_dat, variables, y_var, family = "gaussian")
{
fm <- as.formula(paste(y_var, " ~", paste(variables, collapse = "+")))
glm1 <- glm(fm, data = train_dat, family = family)
pred <- predict(glm1, newdata = test_dat)
return(pred)
}
df[is.na(`7300`),`7300`:=
Line(train_dat=df[!is.na(`7300`),],
test_dat=df[is.na(`7300`)],
variables,
y_var)
]
Now I get the following error:
Error in terms.formula(formula, data = data) :
invalid term in model formula
Do you know how to solve it?