0

So basically, I generated 2 random variables X and Y 1000 times and created a data frame Data=data.frame(x,y) in order to perform a smoothing by spline function. Now I want to recreate exactly that but for B= 1000 times and plot the smoothing functions (B=1,...,1000) to compare its variability

simulation= function(d){


  X=runif(1000,0,10)
  Y=rpois(1000,lambda=2*X+0.2*X*sin(X))
  Data=matrix(data=c(X,Y),ncol=2)
  smoothing_sim=lm(Y~ns(x=X,df=d),data=Data)
  new_x2=seq(min(X),max(X),length.out=100) 

  adjusted_sim=predict(object=smoothing_sim,newdata=data.frame(X=new_x2))
  return(data.frame(new_x2,smoothing_sim))

} 
simulation2=replicate(n=1000,simulation)  

I'm not sure wether my method is good or not. And I'm also not sure how to plot the functions following the simulation. Anyone care to comment? Thanks !

Jng
  • 51
  • 1
  • 6

1 Answers1

0

If you use ggplot, you can make the smooths right in geom_smooth. As ggplot demands long form, using list columns and tidyr::unnest is a useful substitute for replicate, though there are lots of ways to accomplish the data generation step.

library(tidyverse)
set.seed(47)
# A nice theme with a white background to help make low-opacity objects visible
theme_set(hrbrthemes::theme_ipsum_tw())

df <- tibble(replication = seq(100),    # scaled down a little
             x = map(replication, ~runif(1000, 0, 10)), 
             y = map(x, ~rpois(1000, lambda = 2*.x + 0.2*.x*sin(.x)))) %>% 
    unnest() 

# base plot with aesthetics and points
point_plot <- ggplot(df, aes(x, y, group = replication)) + 
    geom_point(alpha = 0.01, stroke = 0) 

point_plot + 
    geom_smooth(method = lm, formula = y ~ splines::ns(x), size = .1, se = FALSE) 

Controlling the line's alpha can be really helpful for this sort of plot, but the alpha parameter in geom_smooth controls the opacity of the standard error ribbon. To set the alpha of the line, use geom_line with stat_smooth:

point_plot + 
    stat_smooth(geom = 'line', method = lm, formula = y ~ splines::ns(x), 
                color = 'blue', alpha = 0.03) 

Currently, the smooth isn't doing much more than OLS here. To make it more flexible, set the degrees of freedom:

point_plot + 
    stat_smooth(geom = 'line', method = lm, formula = y ~ splines::ns(x, df = 5), 
                color = 'blue', alpha = 0.03) 

Given the response is Poisson, it may be worth scaling up to Poisson regression with glm. The largest impact here is that when x is small, y doesn't dip all the way to 0:

point_plot + 
    stat_smooth(geom = 'line', method = glm, method.args = list(family = 'poisson'), 
                formula = y ~ splines::ns(x, df = 5), color = 'blue', alpha = 0.03) 

Adjust further as you like.

alistaire
  • 42,459
  • 4
  • 77
  • 117