Here is my problem (fictional data in order to be reproducible) :
set.seed(42)
df<-data.frame("x"=rnorm(1000),"y"=rnorm(1000),"z"=rnorm(1000))
df2<-data.frame("x"=rnorm(100),"y"=rnorm(100),"z"=rnorm(100))
breaks<-c(-1000,-0.68,-0.01315,0.664,1000)
divider<-cut(df$x,breaks)
divider2<-cut(df2$x,breaks)
subDF<-by(df,INDICES=divider,data.frame)
subDF2<-by(df2,INDICES=divider2,data.frame)
reg<-lapply(subDF,lm,formula=x~.)
pre<-lapply(1:4,function(x){predict(reg[[x]],subDF2[[x]])})
lapply(1:4,function(x){summary(reg[[x]])$r.squared})
The above code works fine. What I am doing is the following : according to the values of x
, I split df
in 4 dataframes and run a regression on each of those dataframes, in order to be able to predict values for an other dataset. The split of the dataframe is to allow a better prediction as the range of x
has a great impact for the actual data.
What I am trying to do is to add a weights argument for the regression to give greater importance to the most recent data. My weights argument is : weights<-0.999^seq(250,1,by=-1)
if there are 250 data. With a seed of 42 and the previous breaks, all of the 4 dimensions are 250.
When I try to do reg<-lapply(subDF,lm,formula=x~.,weights=0.999^seq(250,1,by=-1))
, I got this error :
Error in eval(expr, envir, enclos) :
..2 used in an incorrect context, no ... to look in
Which is quite strange as lapply
has a ...
argument, used here for the formula
but it doesn't accept the weights
.
So I really don't know what to do to add those weights. What should I correct in my code or should I (almost) entirely change it to be able to use the weights ?
For the example and in order to make it (perhaps) easier, I cut the breaks so that the 4 subsets have the same dimension but ideally the answer would work even if the 4 subsets are not of the same dimension (so with breaks of breaks<-c(-1000,-0.75,0,0.75,1000)
for instance)
This post on CrossValidated has quite the same problem, but without a working solution so that didn't help me.