0

I would like to export the summary of a plsr model (pls package) to a nice table (preferably HTML). I am aware of nice methods for lm models, but I am curious if someone out there knew of a quick way to extract the information from plsr and format it to a nice table. I personally struggle finding the same information displayed by summary(my.plsr.model) when I use str().

Here is an example of the summary output

Data:   X dimension: 405 239 
    Y dimension: 405 1
Fit method: kernelpls
Number of components considered: 20

VALIDATION: RMSEP
Cross-validated using 405 leave-one-out segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps  9 comps  10 comps
CV           1.587    1.465    1.394    1.372    1.336    1.296    1.282    1.225    1.211    1.193     1.173
adjCV        1.587    1.465    1.394    1.372    1.336    1.296    1.282    1.225    1.211    1.193     1.173
       11 comps  12 comps  13 comps  14 comps  15 comps  16 comps  17 comps  18 comps  19 comps  20 comps
CV        1.175     1.159     1.174     1.184     1.187     1.173     1.158     1.108     1.115     1.063
adjCV     1.175     1.160     1.175     1.184     1.186     1.173     1.157     1.107     1.114     1.061

TRAINING: % variance explained
      1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps  9 comps  10 comps  11 comps
X       62.23    67.88    83.52    87.71    89.28    92.02    92.71    93.67    94.66     95.36     95.82
Yvar    15.33    26.44    29.10    34.29    40.35    42.50    49.62    52.69    54.16     55.06     56.10
      12 comps  13 comps  14 comps  15 comps  16 comps  17 comps  18 comps  19 comps  20 comps
X        96.68     97.30     97.63     98.02     98.24     98.36     98.49      98.6     98.73
Yvar     56.94     58.51     61.31     63.07     64.64     66.31     67.71      69.1     70.08
andemexoax
  • 323
  • 3
  • 15
  • `summary` sometimes calculates things itself, you might want to capture the output of `summary` and use `str` on that –  Apr 07 '18 at 22:59
  • using `str(summary(plsr_ouput))` only gives the same output as my original post (`summary(plsr_output`) except with one line that says `NULL` that follows – andemexoax Apr 08 '18 at 01:54
  • Hmm. Then look at `print.summary.mvr`; and in general, inspect the code to find out where it is storing the info you need. –  Apr 10 '18 at 13:29

2 Answers2

1

Given @dash2's suggestion and interaction with the pls package developer. He said,"The summary function in the pls package does not return anything, it simply prints out the summary. (I know, this is bad design; it is customary for summary functions in R to return an object, and have a separate print function show them. Perhaps I ought to change that some day. :))

Your best bet is to look at what the summary function actually does, and how it gets its information, and then replicate that yourself. To see the summary function, do pls:::summary.mvr"

I edited the summary function to extract the data which is only printed invisible with the package's original function.

#function to extract data to plot
r2_rmsep_data_func <- function(object,...){
  yvarnames <- respnames(object)
  xve <- explvar(object)
  yve <- 100 * drop(R2(object, estimate = "train", 
                       intercept = FALSE)$val)
  rmseps <- tail(c(RMSEP(object, "CV")$val),-1)
  tbl <- cbind(cumsum(xve), yve, rmseps) #modified to create columns instead of rows
  tbl <- as.data.frame(tbl) 
  rownames(tbl) <- gsub("Comp ", "", rownames(tbl), fixed = TRUE)  
  tbl <- rownames_to_column(tbl,var="Components")
  tbl$Components <- as.numeric(tbl$Components)
  colnames(tbl) <- c("Components", "Spectra", yvarnames,"RMSEP")
  return(tbl)
} 

r2_plus_error_data <- as.data.frame(r2_rmsep_data_func(Trait_plsr))

Now any table would be easy to make with the suggested packages above, however I found that a plot shows the data even better. So with some extra elbow grease, we can put things together for a combo plot showing two y axis with plotly.

#double y-axis plot with RMSEP on right and two R^2 lines (y and x variances explained) on the left

#plotly method
#second y-axis function
ay <- list(
  tickfont = list(color = 'rgb(80,80,80)'),
  overlaying = "y",
  side = "right",
  title = "RMSEP"
)
#vertical line function
vline <- function(x = 0, color = 'rgb(220,220,220)') {
  list(
    type = "line",
    y0 = 0, 
    y1 = 1, 
    yref = "paper",
    x0 = x, 
    x1 = x, 
    line = list(color = color, dash = "dashdot")
  )
}
#actual plot
p <- plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x = ~r2_plus_error_data$Components, y = ~r2_plus_error_data$Spectra, name = "Spectra", line=list(color = 'rgb(22, 96, 167)')) %>%
  add_trace(x= ~r2_plus_error_data$Components, y= ~r2_plus_error_data$M1_lb, name = Trait, line=list(color = 'rgb(205, 12, 24)')) %>% 
  add_trace(x = ~r2_plus_error_data$Components, y = ~r2_plus_error_data$RMSEP, name = "RMSEP", yaxis = "y2", line=list(color = 'rgb(128,128,128)', dash = 'dot')) %>%
  layout(
    title = "Multiple R^2 with RMSEP by Component", yaxis2 = ay,
    xaxis = list(title="Components"), 
    yaxis = list(title="Variance Explained"), 
    legend = list(orientation = 'v', 
                  x = 1.1, y = 1.06), 
    shapes = list(vline(ncomp_permut)), 
    hoverlabel = list(font=list(color="white"))
  )

p

Which returns this RMSEP and R^2 plot

andemexoax
  • 323
  • 3
  • 15
0

Possible options include the broom, texreg, stargazer and (my own) huxtable packages. It looks as though neither broom or texreg have methods for plsr tables, so the best thing may be to turn the output into a data frame and use huxtable:

output <- as_hux(plsr_output)
# you can now edit the output as you desire, e.g. make the first line bold:
bold(output)[1, ] <- TRUE

What plsr_output should be depends on what you want (e.g. coef or scores or loadings - I'm not familiar with the package or the statistical theory).

  • Gives me an error: `Error in as.data.frame.default(x, stringsAsFactors = FALSE) : cannot coerce class ""mvr"" to a data.frame` ...I have tried using `texreg` and saving the output to a dataframe but plsr output are mvr objects that have hard to find documentation – andemexoax Apr 08 '18 at 01:50