How to generate correlated binary variables

Question

I need to generate a series of N random binary variables with a given correlation function. Let x = {x_i} be a series of binary variables (taking the value 0 or 1, i running from 1 to N). The marginal probability is given Pr(x_i = 1) = p, and the variables should be correlated in the following way:

Corr[ x_i x_j ] = const × |i−j|^−α (for i!=j)

where α is a positive number.

If it is easier, consider the correlation function:

Corr[ x_i x_j ] = (|i−j|+1)^−α

The essential part is that I want to investigate the behavior when the correlation function goes like a power law. (not α^|i−j| )

Is it possible to generate a series like this, preferably in Python?

@Paul: No, definitely not: *“MathOverflow's primary goal is for users to ask and answer research level math questions, the sorts of questions you come across when you're writing or reading articles or graduate level books.”* Pretty much all SO math-related questions *don't* fall under this. — Joey, Mar 14 '10 at 10:44
@johannes: OK - thanks for the clarification - it seems like more of a maths question than a programming question in that the programming will be trivial once the maths is nailed down, but it sounds like there is nowhere more suitable to ask this kind of question. — Paul R, Mar 14 '10 at 10:50
Is it correct to interpret the correlation as roughly "the bigger the difference of the indices of two variables is, the smaller is the probability of both being 1"? Intuitively, this prefers clusters. — Svante, Mar 14 '10 at 11:20
@Svante: said in another way: the bigger the difference of the indices of the variables, the more they behave as independent variables. Your intuition is right, this prefers clusters. — jonalm, Mar 14 '10 at 11:56
I don't think this is possible as written. When N is very large, the expected product of widely separated xi must be very close to zero, and we can find a set of 1/p + 1 that must essentially be mutually exclusive despite all having probability p. Perhaps you meant E[xi xj] - p^2 = ... ? — user287792, Mar 14 '10 at 15:34
Also, what range of N are you thinking about, and would you accept a different decay function (say exponential, since it would be a lot easier)? — user287792, Mar 14 '10 at 15:46
@algorithmist: your rigth Corr[xi xj] = E[xi xj] - p^2, I hope to have an N in the order of 1000-10000. — jonalm, Mar 15 '10 at 08:28
This setup seems to have a lot in common with the binary Ising model, although it would take effort to work out exactly what your assumed correlation structure implies. Because every item is connected, I doubt there is a one-shot simulation strategy. You most likely want to implement a Gibbs sampler. Work out the conditional distribution of any particular element, holding the others fixed, and then iteratively simulate from that, starting from a random position. — Tristan, Mar 15 '10 at 12:57
jonalm: are you sure it's |i-j|^-alpha rather than alpha^|i-j| ? (just checking for obvious typos) — Jason S, Mar 15 '10 at 15:23
@jonalm: Could you edit your question with the correct correlation formula ? It's easier to look up in the function. — Matthieu M., Mar 15 '10 at 16:40
jonalm, do you need to implement this specific correlation structure, or just some correlation structure the stength of which decreases with distance (and is modifiable by one parameter)? — Aniko, Mar 15 '10 at 17:37
@Jason S, @Aniko and @Matthhiu M. the essential feature of the correlation function is the power law. I have edited the question. hope it is clear. @Tristan, that is right, this is essentially the Ising model at the critical point. Thanks for the imput. — jonalm, Mar 16 '10 at 04:29

score 5 · Accepted Answer · answered Mar 18 '10 at 04:57

5

Thanks for all your inputs. I found an answer to my question in the cute little article by Chul Gyu Park et al., so in case anyone run into the same problem, look up:

"A simple method for Generating Correlated Binary Variates" (jstor.org.stable/2684925)

for a simple algorithm. The algorithm works if all the elements in the correlation matrix are positive, and for a general marginal distribution Pr(x_i)=p_i.

j

answered Mar 18 '10 at 04:57

jonalm

935
2
11
21

1

interesting... see also http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.imsc/1233152938&page=record – Jason S Mar 19 '10 at 17:52
2

Did you implement this in Python? Do you still have the script? Any chance you could include it please? – R. Cox Apr 04 '19 at 12:29
@JasonS That link is dead. – Ceph Jun 01 '22 at 13:29
I don't even remember what that was, but it's available in archive.org https://web.archive.org/web/20160922011407/http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.imsc/1233152938&page=record – Jason S Jun 02 '22 at 15:54

score 2 · Answer 2 · edited Apr 13 '17 at 12:57

2

You're describing a random process, and it looks like a tough one to me... if you eliminated the binary (0,1) requirement, and instead specified the expected value and variance, it would be possible to describe this as a white noise generator feeding through a 1-pole low-pass filter, which I think would give you the α^|i-j| characteristic.

This actually might meet the bar for mathoverflow.net, depending on how it is phrased. Let me try asking....

update: I did ask on mathoverflow.net for the α^|i-j| case. But perhaps there are some ideas there that can be adapted to your case.

edited Apr 13 '17 at 12:57

Community

1
1

answered Mar 15 '10 at 14:06

Jason S

184,598
164
608
970

It's actually |i-j|^-\alpha; the solution for \alpha^|i-j| is in the literature. – user287792 Mar 15 '10 at 14:28
hmmm... |i-j|^-alpha has no solution for i=j. Are we sure the OP did not mis-state? – Jason S Mar 15 '10 at 15:21
It can be Corr[xi xj] = const x|i-j|^-alpha for i != j, or Corr[xj xi] = (|i-j|+1)^alfa (whatever is easiest). Either way, Im not claiming that they are equal, but only interested in the tail behavior (|i-j| >> 1) so it should not matter. – jonalm Mar 16 '10 at 04:09
Thank you so much Jason. Although it was a different correlation, the solutions were really interesting. – jonalm Mar 19 '10 at 16:30

score 1 · Answer 3 · answered Mar 15 '10 at 13:25

1

A quick search at RSeek reveals that R has packages

to do this.

answered Mar 15 '10 at 13:25

Dirk Eddelbuettel

360,940
56
644
725

That was my first idea too, but I doubt they could handle N=1000-10000 per jonalm's statement in the comments. – Aniko Mar 15 '10 at 17:36
I dont know R, but this seems like a reason to learn it :) Thanks. – jonalm Mar 16 '10 at 04:02

jkff · Answer 4 · 2010-03-15T08:41:11.897

Express the distribution x_i as a linear combination of some independent basis distributions f_j: x_i = a_i1f₁ + a_i2f₂ + ... . Let us constrain f_j to be independent variables uniformly distributed in 0..1 or in {0,1} (discrete). Let us now express everything we know in matrix form:

Let X be the vector (x1, x2, .., xn)
Let A be the matrix (a_ij) of dimension (k,n) (n rows, k columns)
Let F be the vector (f1, f2, .., fk) 
Let P be the vector (p1, p2, .., pn)
Let R be the matrix (E[x_i,x_j]) for i,j=1..n
Definition of the X distribution: X = A * F
Constraint on the mean of individual X variables: P = A * (1 ..k times.. 1)
Correlation constraint: AT*A = 3R or 2R in the discrete case (because E[x_i x_j] = 
  E[(a_i1*f_1 + a_i2*f_2 + ...)*(a_j1*f_1 + a_j2*f_2 + ...)] =
  E[sum over p,q: a_ip*f_p*a_jq*f_q] = (since for p/=q holds E[f_p*f_q]=0)
  E[sum over p: a_ip*a_jp*f_p^2] =
  sum over p: a_ip*a_jp*E[f_p^2] = (since E[f_p^2] = 1/3 or 1/2 for the discrete case)
  sum over p: 1/3 or 1/2*a_ip*a_jp
And the vector consisting of those sums over p: a_ip*a_jp is precisely AT*A.

Now you need to solve the two equations:

AT*A      = 3R (or 2R in the discrete case)
A*(1...1) = P

Solution of the first equation corresponds to finding the square root of the matrix 3R or 2R. See for example http://en.wikipedia.org/wiki/Cholesky_factorization and generally http://en.wikipedia.org/wiki/Square_root_of_a_matrix . Something also should be done about the second one :)

I ask mathematicians around to correct me, because I may very well have mixed ATA with AAT or done something even more wrong.

To generate a value of x_i as a linear mixture of the basis distributions, use a two-step process: 1) use a uniform random variable to choose one of the basis distributions, weighted with corresponding probability, 2) generate a result using the chosen basis distribution.

Unfortunately, the continuous -> discrete transition is often the hardest part. For example, the problem of finding a Hadamard matrix gets a _lot_ easier if complex entries are allowed. I don't see any way to discretize your solution within the given framework. — user287792, Mar 14 '10 at 16:40
Why should it be hard? The solution just depends on the resulting distribution being a linear mixture of basis distributions but I don't see how it depends on their continuity. Is it that discrete distributions can't easily be linearly mixed? — jkff, Mar 14 '10 at 18:50
In this case it's the fact that your continuous distributions aren't convex combinations of Bernoulli trials. — user287792, Mar 14 '10 at 23:35
I'm sorry, I don't quite understand. I know what is a convex combination and what are Bernoulli trials, but still: I've edited my post; does not the process described in the last paragraph give a correct result? If so, could you point me to some sources expanding your point? (anyways, probably I should just implement the stuff I described and see if it works) — jkff, Mar 15 '10 at 08:44
The problem is that A may have entries that are not between 0 and 1. — user287792, Mar 15 '10 at 14:09

score 0 · Answer 5 · answered Mar 14 '10 at 17:49

The brute force solution is to express the constraints of the problem as a linear program with 2^N variables pr(w) where w ranges over all binary strings of length N. First the constraint that pr be a probability distribution:

for all w: 0 <= pr(w) <= 1
sum_w pr(w) = 1

Second, the constraint that the expectation of each variable be p:

for all i: sum_{w such that w[i] = 1} pr(w) = p

Third, the covariance constraints:

for all i < j: sum_{w such that w[i] = w[j] = 1} pr(w) = const * |j - i|^alpha - p^2

This is very slow, but a cursory literature search turned up nothing better. If you decide to implement it, here are some LP solvers with Python bindings: http://wiki.python.org/moin/NumericAndScientific/Libraries

I don't know linear programming, but I do not see how this will work. Any configuration of the binary series will have a non-zero probability. Is it possible to calculate the probability of any configuration? — jonalm, Mar 15 '10 at 08:31
Yes, if the problem is solvable, then the LP routine will give you the probability of each of the 2^N configurations. — user287792, Mar 15 '10 at 14:03

score 0 · Answer 6 · answered Mar 17 '10 at 18:33

Here's an intuitive / experimental approach that seems to work.

If b is an binary r.v., m is the mean of the binary r.v., c is the correlation you want, rand() generates a U(0,1) r.v., and d is the correlated binary r.v. you want:

d = if(rand() < c, b, if(rand() < m , 0, 1))

That is if a uniform r.v. is less than the desired correlation, d = b. Otherwise d = another random binary number.

I ran this 1000 times for a column of 2000 binary r.v.s. with m=.5 and c = .4 and c = .5 The correlation mean was exactly as specified, the distribution appeared to be normal. For a correlation of 0.4 the std deviation of the correlation was 0.02.

Sorry - I can't prove that this works all the time, but you have to admit, it sure is easy.

Reread the question: that's not the right correlation structure. — user287792, Mar 17 '10 at 18:41

How to generate correlated binary variables

6 Answers6