How do I fit a pymc3 model when each person has multiple data points?

Question

I'm trying to practice using pymc3 on the kinds of data I come across in my research, but I'm having trouble thinking through how to fit the model when each person gives me multiple data points, and each person comes from a different group (so trying a hierarchical model).

Here's the practice scenario I'm using: Suppose we have 2 groups of people, N = 30 in each group. All 60 people go through a 10 question survey, where each person can response ("1") or not respond ("0") to each question. So, for each person, I have an array of length 10 with 1's and 0's.

To model these data, I assume each person has some latent trait "theta", and each item has a "discrimination" a and a "difficulty" b (this is just a basic item response model), and the probability of responding ("1") is given by: (1 + exp(-a(theta - b)))^(-1). (Logistic applied to a(theta - b) .)

Here is how I tried to fit it using pymc3:

traces = {}

for grp in range(2):

group = prac_data["Group ID"] == grp
data = prac_data[group]["Response"]

with pm.Model() as irt:
    # Priors
    a_tmp = pm.Normal('a_tmp',mu=0, sd = 1, shape = 10)
    a = pm.Deterministic('a', np.exp(a_tmp))
    # We do this transformation since we must have a >= 0 
    
    b = pm.Normal('b', mu = 0, sd = 1, shape = 10)
    
    # Now for the hyperpriors on the groups:
    theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1)
    theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0)
    
   
    
    theta = pm.Normal('theta', mu = theta_mu,
                             sd = theta_sigma, shape = N)
    
    p = getProbs(Disc, Diff, theta, N)
    
    y = pm.Bernoulli('y', p = p, observed = data)
        
    traces[grp] = pm.sample(1000)

The function "getProbs" is supposed to give me an array of probabilities for the Bernoulli random variable, as the probability of responding 1 changes across trials/survey questions for each person. But this method gives me an error because it says to "specify one of p or logit_p", but I thought I did with the function?

Here's the code for "getProbs" in case it's helpful:

def getProbs(Disc, Diff, THETA, Nprt):
# Get a large array of probabilities for the bernoulli random variable
n = len(Disc)
m = Nprt
probs = np.array([])
for th in range(m):
    for t in range(n):
        p = item(Disc[t], Diff[t], THETA[th])
        probs = np.append(probs, p)
return probs

I added the Nprt parameter because if I tried to get the length of THETA, it would give me an error since it is a FreeRV object. I know I can try and vectorize the "item" function, which is just the logistic function I put above, instead of doing it this way, but that also got me an error when I tried to run it.

I think I can do something with pm.Data to fix this, but the documentation isn't exactly clear to me.

Basically, I'm used to building models in JAGS, where you loop through each data point, but pymc3 doesn't seem to work like that. I'm confused about how to build/index my random variables in the model to make sure that the probabilities change how I'd like them to from trial-to-trial, and to make sure that the parameters I'm estimating correspond to the right person in the right group.

Thanks in advance for any help. I'm pretty new to pymc3 and trying to get the hang of it, and wanted to try something different from JAGS.

EDIT: I was able to solve this by first building the array I needed by looping through the trials, then transforming the array using:

p = theano.tensor.stack(p, axis = 0)

I then put this new variable in the "p" argument of the Bernoulli instance and it worked! Here's the updated full model: (below, I imported theano.tensor as T)

group = group.astype('int') 
data = prac_data["Response"]


with pm.Model() as irt:
# Priors

# Item parameters:
a = pm.Gamma('a', alpha = 1, beta = 1, shape = 10) # Discrimination

b = pm.Normal('b', mu = 0, sd = 1, shape = 10) # Difficulty

# Now for the hyperpriors on the groups: shape = 2 as there are 2 groups
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1, shape = 2)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0, shape = 2)


# Individual-level person parameters:
# group is a 2*N array that lets the model know which
# theta_mu to use for each theta to estimate
theta = pm.Normal('theta', mu = theta_mu[group],
                         sd = theta_sigma[group], shape = 2*N)


# Here, we're building an array of the probabilities we need for 
# each trial:
p = np.array([])
for n in range(2*N):
    for t in range(10):
        x = -a[t]*(theta[n] - b[t])
        p = np.append(p, x)
  
# Here, we turn p into a tensor object to put as an argument to the
# Bernoulli random variable
p = T.stack(p, axis = 0)

y = pm.Bernoulli('y', logit_p = p, observed = data)


# On my computer, this took about 5 minutes to run. 
traces = pm.sample(1000, cores = 1)
    
print(az.summary(traces)) # Summary of parameter distributions

How do I fit a pymc3 model when each person has multiple data points?

0 Answers0