0

everyone. I have a question on how to do panel data analysis in Bayesian model with pymc. The data is like:

..........................................................
User    Time     x1          x2         x3           Y
1        1        1          1           3           2      
1        2        2          1           4           1
1        3        2          2           2           1
1        4        1          3           1           3
1        5        1          1           2           3
2        1        1          3           1           3  
2        2        1          1           2           2
2        3        2          3           1           0
2        4        1          2           2           3
2        5        1          1           1           2    
3        1        4          3           1           3  
3        2        3          1           3           2
3        3        2          3           2           2
3        4        2          1           2           3
3        5        1          1           1           2
4        1        1          1           3           2      
4        2        2          2           4           3
4        3        2          2           2           1
4        1        1          3           1           3
4        1        4          5           2           3  
.............   
..........................................................

Now, I have N-users on T-times samples (N≫T), as well as independent variables(x1,x2,x3) and dependent variable(Y).

Now, I want to analyze the X's impact on Y in collective-level. Take the most simple linear regression as example, follow the book of "Introduction to Bayesian Econometrics"(PP.145), the general model is often be written as:

$$ y_{it} = x_{it}{\beta}+ w_{it}{b_i}+ {u_{it}}, i = 1,...,n;\;\;t = 1,...,T $$

In which, $i$ indicates the user; $t$ represents the time; ${\beta}$ is not differ across $i$, called fixed effects; ${b_i}$ differs across $i$, called random effects.

In Bayesian opinion, both ${\beta}$ and ${b_i}$ are regarded as random variables. So, let ${\beta} $~$ N({\beta}_0,{\beta}_1)$, and ${b_i} $~$ N({\lambda_0},{\lambda_1})$

However, this is the general thought in theory, but I do not have any idea on how to model and fit it in pymc.

Thanks anyone give me some inspiration or example code.

Runner
  • 1
  • 2

1 Answers1

0

The following blog post contains a good example of fitting a linear regression using PyMC3. It also contains a short cut, using the glm module, which is particularly useful for those familiar with R syntax.

http://twiecki.github.io/blog/2013/09/12/bayesian-glms-1/

For your model, which is multivariate, you will want an x_coeff for each variable. The easiest way to do this is to pass 'size = 4' when calling Normal(). This will generate 4 stochastic variables, one for each variable in your data, and return it as an array.