Using PyMC3 to uncover various qualities of a simulated neurological system

Question

I'd like to preface this by making it clear that I am not interested in someone doing all of this work for me - I'm only interested in guidance.

I am reading a paper: https://doi.org/10.1371/journal.pcbi.1007481 in which the authors write C code that uses a Metropolis-Hastings method to produce a binary matrix of neuronal states over time. They then analyze that data with home-brewed Bayesian Inference, Metropolist-Hastings, and a Dirichlet Process Prior process in order to uncover how many distinct neuronal assemblies exist in the system, as well as a handful of other qualities.

I would like to take this code, all written in C, and write it in Python so that alterations can be made to it more easily than in C. I have the data generation written but I am struggling to write the analysis, because every super parameter depends on between one and three other hyper-parameters - among which live the number of neuronal assemblies.

I have a physics and math bachellors, and have poured over this for a few months now with little progress. I've had to learn a tremendous amount about Bayesian Inference, Python, C, and now PyMC3.

PyMC3 has a host of capabilities, including Mixture distributions and even a Dirichlet Process which could solve my hyper-parameter conundrum. However, this portion of PyMC3's documentation is written for a much higher level of understanding than I have.

Could someone help me break down how these are used, or point me to some useful references for understanding and implementing this within the next decade?

Here is a sample of what I have attempted - perhaps it will illuminate precicely what I fail to grasp.

import arviz as az
import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt

"""
NOTATION
-N neurons in A assemblies over M time frames.
-Each neuron belongs to one assembly. The vector t is the assembly for each neuron.
-w is an M x A matrix which is the state of all assemblies over time [w_ma is either on (1) or off (0)]
-mu_vec is the set of neurons assigned to a specific assembly mu
-Vectors of parameters are denoted by dropping the index (like x[1] is an element of x)
-Neuronal  memberships and assembly activity matrix are unobserved variables we want to estimate from observed neuronal 
data.

GENERATION
- With a probability unique to each assembly, choose mu[i] from a categorical distribution..
- At time k, from a Bernoulli distribution, choose whether assembly mu is on or off (w_mu,k) with probability p[mu]-the 
probability that assembly mu is turned on at a specific time k.
- The conditional probability lambda is defined uniquely for the assembly, and is the probability that the ith member of 
assembly mu, at time k is on, given that the entire assembly is either on (one column of the matrix) or off (the other 
column of the matrix) at that same time. Draw the activity of all neurons at all times from this conditional probability 
lambda. lambda[mu][1] is the probability that at least one neuron in aseembly mu will fire when the assembly is on-this
is the level of synchrony in the system.

LIKELIHOOD
- We can calculate joint probability of neuronal membership t, assembly activity matrix w, and neuronal activity matrix
s conditional to model parameters theta. So this is the likelihood of having neuronal membership t AND having specific
activity matrices w and s with a given formula. This looks like a product of priors, giving like a net prior or
something, so I'll work under that assumption.
- Okay, now they give the distributions used as priors on the different model parameters:

assembly activation probability, p[mu]:             Beta(alpha_mu^(p), beta_mu^(p))
(a)syncronous firing probability, lambda[mu][z]:    Beta(alpha_(z,mu)^(lambda), beta_(z, mu)^(lambda))
    So alpha/beta are dependent on if the neuron in assembly mu is on or off?
Size of each assembly, n = {n1, n2, ..., nA}:       Dirichlet(alpha_1^(n),...,alpha_A^(n))

- alphas and betas above are hyper parameters describing prior knowledge on the model parameters, like expected temporal
sparisty of synchronous activiation of neuronal populations
"""
# Values for data generation
NCELLS = 400
TIMES = 1000
SEED = 1
outfile = 'out/membership_orig.dat'
lastIsFree = False
# OF values used for data generation, these are to be estimated by PyMC3
NUM_ASSEMBLIES = 4
ACTIVITY = 0.3
LAMBDA0 = 0.04
LAMBDA1 = 0.7


with pm.Model() as unknown_model:
    # TODO: read through the paper and try to make as many assumptions as possible. Like only one time step, only one
    #  assembly, only one cell, etc. Then slowly incorporate one more quality at a time.
    # PRIOR: Number of assemblies
    """
    Not sure what the distribution should be for the number of assemblies, so I'm going to say there could be anything 
    between no assemblies and every cell being an assembly. No reason to prefer one answer over another yet.
    """
    num_assemblies_prior = pm.DiscreteUniform('num_assemblies_prior', lower=0, upper=NCELLS)

    # PRIOR: alpha/beta
    """
    Not sure what distributions are for alpha and beta, so I'll guess half Normal. I think it should be different for 
    each parameter and assembly, so I'll randomly choose a standard deviation from a half normal distribution.
    """
    # TODO: Currently I assume that each alpha and beta are from a half normal distribution whose standard deviation is
    #  the same for each assembly corresponding to that parameter.
    alpha_p, alpha_lambda, alpha_n = np.abs(np.random.normal(0, 1, 3))
    beta_p, beta_lambda, beta_n = np.abs(np.random.normal(0, 1, 3))

    # PRIOR: p
    """
    So there are supposed to be mu different priors for p, but we don't know mu, so how do we integrate that into 
    a single prior? Also, alpha and beta themselves should be distributions unique to each assembly. To dela with a 
    case like this with a variable number of assemblies, the Dirichlet Process was implimented. So try to see if we 
    can make the number of parameters more dynamic with PyMC3. Maybe fix the numebr of assemblies first isntead of 
    trying to make it dynamic. Also, consider switching to (Num)Pyro isntead of PyMC3, since Thanos halted development.
    https://docs.pymc.io/notebooks/data_container.html is a doc maybe about PyMC3 dyanamics. Nesting priors might also 
    work. Taking baby steps is totally fine - it's better to get somewhere than nowhere!
    """
    p_prior = pm.Beta('p_prior', alpha_p, beta_p)

    x = [num_assemblies_prior, p_prior]

I'm voting to close as too broad. I think it is interesting, just that it needs to be pared down into a concrete problem to solve. Try asking only a single question about a specific point you need help with - and feel free to create multiple questions, but separately. — merv, Sep 30 '20 at 18:22

Using PyMC3 to uncover various qualities of a simulated neurological system

0 Answers0