4

I'm fitting a hierarchical model to some data, the fit appears to converge acceptably.

with pm.Model() as model:
    mu_a = pm.Normal('mu_a', 0, sd=.2)
    sigma_b = pm.HalfNormal('sig_a', 0.1)

    mean = pm.Normal('mean', mu_a, sigma_b, shape=n)
    std = pm.HalfNormal('std', 0.01 , shape=n)

    means = mean[h]
    stds = std[h]

    y = pm.Laplace('y', mu=means, b=stds, observed=data) 
    hierarchical_trace = pm.sample(2000, n_init=30000) 

When checking the posterior predictive the tails seem reasonable, min and max of the data (black line) both seem to be inside the min/max of generated samples (this is not the case with StudentT).

ppc_trace = pm.sample_ppc(model=model, trace=hierarchical_trace)

ppc with min/max/mean of original data

However the mean (right most chart) is way off, i think this is because my data is negatively skewed, so the mass of the data moves the mean too far to the right.

sp.stats.skew(data)

-0.1699020117521286

What is the recommended approach in Pymc3 to model this sort of data. Although it is a symmetric distribution Laplace seems like a good fit for my data. a Gaussian doesn't provide enough weight in the tails (which rules out skew normal?). How can I model this moderately skewed data?

My objective is to obtain an accurate MAP estimate with credible intervals for the different portions of my data (based on the hierarchical specification)

Darren Brien
  • 115
  • 7

1 Answers1

1

Rubber ducked this one... but answer for anyone stumbling along later

I found an asymmetric laplace worked well to address the lack of fit.

def asym_laplace_log_p(x, m, lam, k):
    diff = x - m
    s = tt.sgn(diff)
    return tt.log(lam / (k + 1 /k)) + ( - diff * lam * s * tt.pow(k, s))

def asym_laplace_cdf(x, m, lam, k):
    diff = x - m
    k_2 = k ** 2
    if x <= m:
        return (k_2 / (1 + k_2)) * np.exp((lam / k) * diff)
    return 1 - ((1 / (1 + k_2)) * np.exp(-1 * lam * k * diff))

def inverse_cdf(u, m, lam, k):
    s = np.sign(u)
    k_s = np.power(k, s)
    return m - (1/ (lam * s * k_s)) * np.log(u * s * k_s)

def asym_laplace_mean(m, lam, k):
    return m + ((1 - k** 2) / (lam * k))

Then inside the model

y = pm.DensityDist('y', lambda x: asym_laplace_dist(x, means, stds, k), testval=0, observed=data)  

cdf, inverse cdf and mean just for debugging purposes, worth noting this implementation is using lambda for the shape rather than 1/lambda so i found a half cauchy for the shape prior worked better than the half normal in the original question.

Would be glad to hear feedback on this implementation.

At time of writing, density dist doesn't work with sample_ppc ("AttributeError: 'DensityDist' object has no attribute 'random'") so I may end up generating my own samples via the above using the generated location, shape and skew values.

I don't think this is entirely kosher so would be glad of some direction on this (or directions to fix this and use sample_ppc directly).

Darren Brien
  • 115
  • 7