Plot_cap response curve for counterfactual data

Question

The following code was made up to replicate my problem with a bigger, more complex data set.

library(marginaleffects)
library(truncnorm)

yield_kgha<-rtruncnorm(n=100, mean=2000, sd=150)
n_kgha<-rtruncnorm(n=100, a=40, b=298, mean=150, sd=40)

i<-lm(yield_kgha~n_kgha+I(n_kgha^2))
summary(i)

I have used the predictions function from the marginaleffects package to see the predicted yields(yield_kgha) for a range of nitrogen rates (n_kgha) from my regression model (i). My original data only has n_kgha rates ranging from approximately 40-250, so the below code allows me to see predicted yields at n_kgha rates not actually in my data set.

p <- predictions(
  i,
  newdata = datagrid(model = i, n_kgha = seq(0, 300, by = 25), grid_type = "counterfactual"))
summary(p, by = "n_kgha")

I would like to plot the response of yield conditioned on n_kgha ranging from 0-300, which includes n_kgha values not in my original data set. I have tried to do this using the plot_cap function.

plot_cap(i, condition = "n_kgha")

However, since my original data only has n_kgha rates ranging from 40-250 I am not getting my desired result of seeing the response curve over the n_kgha (0:300) range. When I plot using the plot_cap function I get the following response curve with n_kgha ranging from 40-250 (the max and min of the original data set).

Is there a way to run the plot_cap function based on the counterfactual range of n_kgha as used in the predictions function? Or should I use another method to plot the predicted values based on counterfactual values?

score 1 · Accepted Answer · answered Jun 23 '22 at 21:03

The plot_cap() function only plots over the observed range of values. However, since predictions() returns a “tidy” data frame, it is trivial to use the the output of this function with ggplot2 to get the plot you need.

Note that we do not need to specify grid.type="counterfactual". This is option will do something very weird and specific: duplicate the whole dataset many times for each value of the user-supplied values. It is only useful in very specific corner-cases, and not when you just want to make predictions over unobserved values of the predictors. See the documentation with ?datagrid.

Here’s a simple example of predictions() with ggplot2 to achieve what you want:

library(marginaleffects)
library(truncnorm)
library(ggplot2)

yield_kgha <- rtruncnorm(n = 100, mean = 2000, sd = 150)
n_kgha <- rtruncnorm(n = 100, a = 40, b = 298, mean = 150, sd = 40)

i <- lm(yield_kgha ~ n_kgha + I(n_kgha^2))

p <- predictions(i, newdata = datagrid(n_kgha = seq(0, 300, 10)))

ggplot(p, aes(n_kgha, predicted)) +
    geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .1) +
    geom_line() +
    theme_minimal()

Plot_cap response curve for counterfactual data

1 Answers1