2

If my observed dataset has weights (for example tracking multiplicity) is it possible to provide this either to pystan or pymc3, similar to the function signature (http://mc-stan.org/rstanarm/reference/stan_glm.html) in the rstanarm package:

stan_glm(formula, family = gaussian(), data, weights, subset,
  na.action = NULL, offset = NULL, model = TRUE, x = FALSE, y = TRUE,
  contrasts = NULL, ..., prior = normal(), prior_intercept = normal(),
  prior_aux = exponential(), prior_PD = FALSE, algorithm = c("sampling",
  "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE,
  sparse = FALSE)
Ben Goodrich
  • 4,870
  • 1
  • 19
  • 18
maxymoo
  • 35,286
  • 11
  • 92
  • 119

1 Answers1

4

With Stan (in any of its interfaces, including PyStan), you can introduce weights within the model. For example, in a linear regression, that'd be e.g., instead of y[i] ~ normal(mu[i], sigma) you use target += weight[i] * normal_lpdf(y[i] | mu[i], sigma).

This gives you a well specified density if the weights are positive. We tend to prefer generative approaches.

Bob Carpenter
  • 3,613
  • 1
  • 20
  • 13
  • Hi Bob, thanks so much for taking the time to help. Do you know if I can do this for a hierarchical logistic regression? When I try your answer with `bernoulli_logit_lpdf` I'm getting the error `No matches for::bernoulli_logit_lpdf(vector, vector) Function bernoulli_logit_lpdf not found.` – maxymoo Nov 28 '17 at 05:23
  • 2
    The error message is correct---it should've also told you what was available. `bernoulli_logit_lpdf` requires an integer or array of integers as its first argument, not a vector as its first argument. And yes, you can do this hierarchically---the weighting (multiplication on log scale) does just that---it's like seeing that many fractional observations. If the weights are all non-negative integers, it reduces to the ordinary definition for that many observations. – Bob Carpenter Nov 28 '17 at 17:50