transform two matching data sets to long format and join with row variables

Question

I have an experimental predict method for the nestedLogit package that generates predicted probabilities and their standard errors for a polytomous response. Both the fitted probabilities and standard errors are returned as matrices. For plotting, I need to transform each to long format, and join this to the newdata data set used in the predictions.

I can sort of do this, but it is clumsy, and my goal is to write a function to do this in general.

Basic example, with three levels of the response partic, and hincome, children:

if(!require(nestedLogit)) install.packages("nestedLogit")
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
data(Womenlf, package="carData")

comparisons <- logits(work=dichotomy("not.work", c("parttime", "fulltime")),
                      full=dichotomy("parttime", "fulltime"))

wlf.nested <- nestedLogit(partic ~ hincome + children,
                          dichotomies = comparisons,
                          data=Womenlf)
# get predicted values for a grid
new <- expand.grid(hincome=seq(0, 45, length=4),
                   children=c("absent", "present"))

pred.nested <- predict(wlf.nested, newdata = new)
names(pred.nested)

> names(pred.nested)
[1] "p"        "logit"    "se.p"     "se.logit"

(The predict function also returns logits and their standard errors, but here I'm just interested in the probabilities, p, and std errors se.p

Here's what I've done with this example:

# matrices don't work well with tidy processing; these should be data.frames
pred.nested.df <- lapply(pred.nested, as.data.frame)

p.fitted  <- pred.nested.df$p |> bind_cols(new)
p.se <- pred.nested.df$se.p
p_long <- p.fitted |>
  tidyr::pivot_longer(cols = not.work:fulltime,
                      names_to = "response",
                      values_to = "prob")
se_long <- p.fitted |>
  tidyr::pivot_longer(cols = not.work:fulltime,
                      names_to = "response",
                      values_to = "se.prob")

# put them together
plotprob <- cbind(p_long, se.prob = se_long$se.prob)

I get what I want at the end, in plotprob

> head(p_long)
# A tibble: 6 x 4
  hincome children response   prob
    <dbl> <fct>    <chr>     <dbl>
1       0 absent   not.work 0.208 
2       0 absent   parttime 0.0237
3       0 absent   fulltime 0.768 
4      15 absent   not.work 0.332 
5      15 absent   parttime 0.0894
6      15 absent   fulltime 0.579 
> head(se_long)
# A tibble: 6 x 4
  hincome children response se.prob
    <dbl> <fct>    <chr>      <dbl>
1       0 absent   not.work  0.208 
2       0 absent   parttime  0.0237
3       0 absent   fulltime  0.768 
4      15 absent   not.work  0.332 
5      15 absent   parttime  0.0894
6      15 absent   fulltime  0.579 
> plotprob <- cbind(p_long, se.prob = se_long$se.prob)
> head(plotprob)
  hincome children response       prob    se.prob
1       0   absent not.work 0.20819668 0.20819668
2       0   absent parttime 0.02371554 0.02371554
3       0   absent fulltime 0.76808778 0.76808778
4      15   absent not.work 0.33154549 0.33154549
5      15   absent parttime 0.08936286 0.08936286
6      15   absent fulltime 0.57909165 0.57909165

Perhaps there is a simpler way? But, in any case, I can't see how to write a function to do this. Say,

make_plotdata <- function(fit, se, newdata) {
}

# call as
plotdata <- make_plotdata(pred.nested.df$p, pred.nested.df$se.p, new)

Did you forget to include the `predict` method? I tried running your code but the `predict` result is a matrix, not a list. Perhaps this question could be simplified a lot by just including examples of the matrices you want to process? — Mikko Marttila, May 20 '23 at 14:07
Sorry; the new predict method is still in an experimental branch — user101089, May 20 '23 at 16:30

Mikko Marttila · Answer 1 · 2023-05-20T14:35:09.550

Suppose you have a data frame and two matrices whose rows correspond to the same units of observation:

set.seed(42)

tbl <- data.frame(id = seq_len(5))
(A <- cbind(x = rnorm(5), y = rnorm(5)))
#>               x           y
#> [1,]  1.3709584 -0.10612452
#> [2,] -0.5646982  1.51152200
#> [3,]  0.3631284 -0.09465904
#> [4,]  0.6328626  2.01842371
#> [5,]  0.4042683 -0.06271410
(B <- cbind(x = runif(5), y = runif(5)))
#>               x         y
#> [1,] 0.90403139 0.5142118
#> [2,] 0.13871017 0.3902035
#> [3,] 0.98889173 0.9057381
#> [4,] 0.94666823 0.4469696
#> [5,] 0.08243756 0.8360043

You can combine them into one long form data frame like so:

cbind(
  tbl,
  col = rep(colnames(A), each = nrow(A)),
  A = as.vector(A),
  B = as.vector(B)
)
#>    id col           A          B
#> 1   1   x  1.37095845 0.90403139
#> 2   2   x -0.56469817 0.13871017
#> 3   3   x  0.36312841 0.98889173
#> 4   4   x  0.63286260 0.94666823
#> 5   5   x  0.40426832 0.08243756
#> 6   1   y -0.10612452 0.51421178
#> 7   2   y  1.51152200 0.39020347
#> 8   3   y -0.09465904 0.90573813
#> 9   4   y  2.01842371 0.44696963
#> 10  5   y -0.06271410 0.83600426

score 0 · Answer 2 · answered May 20 '23 at 19:40

With @mikko-martilla help, my solution was to write an as.data.frame method.

#' Convert predictNestedLogit objects to a data.frame
#'
#' @param x         a predictNestedLogit object
#' @param row.names row.names for result (not currently used)
#' @param newdata   the \code{newdata} data.frame used to generate predicted values
#' @param ...       other arguments (unused)
#'
#' @return A data frame containing the newdata values of predictors along with the columns
#'         \code{response}, \code{p}, \code{se.p}, \code{logit}, \code{se.logit}
#' @export
#'
#' @examples
as.data.frame.predictNestedLogit <- function(x, row.names = NULL, newdata, ...){
  if(missing(newdata)) stop("`newdata` is required.")
  resp.names <- colnames(x$p)

  idx <- rep(seq_len(nrow(newdata)), each = length(resp.names))
  result <- newdata[idx, ]
  result <- cbind(
    result,
    response = rep(resp.names, nrow(x$p)),
    p        = as.vector(t(x$p)),
    se.p     = as.vector(t(x$se.p)),
    logit    = as.vector(t(x$logit)),
    se.logit = as.vector(t(x$se.logit))
  )
  rownames(result) <- NULL
  result
}

transform two matching data sets to long format and join with row variables

2 Answers2