0

N.B: This is not a duplication of my last question because I added constraints!

I want to generate a matrix A(40x10000) of random number between 1 and 100 with a given probability:

p1=Prob(1)     (chance of 1)
p2=Prob(2)     (chance of 2)
 ... 
p100=Prob(100)  (chance of 100)

and constraints: V1,V2,...,V20 are vectors containing 4 elements between 1 and 100. Each column vector of the matrix A should contain at least one element of each of these 20 vectors. V1, ..., V20 are predefined vectors with known elements.

for example, how to modify the following program to add the last constraint:

h = 40; w = 10000;
A = reshape( randsample( numel(Prob), h*w, true, Prob ), [h w] );

more details:

  • each A(:,i) {i=1,..,10000} should contain Vk(1) or Vk(2) or Vk(3) or Vk(4) for all k=1,..,20. A(:,i) must contain at least one value from every Vk, but that it will respect the probabilities and does not generate duplicate values. If some values of Vi and Vj are equal, A(:,k) could have a single element validating both Vi and Vj constraints.

  • for example: if V1=[6 87 1 56], A(:,i) should contain 6 or 87 or 1 or 56 but A(:, i) may contain (6 and 1) or (6 and 1 and 87) or ...

Sumurai8
  • 20,333
  • 11
  • 66
  • 100
bzak
  • 563
  • 1
  • 8
  • 21
  • could you show us a small example, I'm not sure I understand the constraints.. – Amro Jul 12 '14 at 11:26
  • @Amro: each A(:,i) (i=1,..,10000) should contain Vk(1) or Vk(2) or Vk(3) or Vk(4) for all k=1,..,20 – bzak Jul 12 '14 at 11:35
  • @Amro: if V1=[6 87 1 56], A(:,i) should contain 6 or 87 or 1 or 56 but A (:, i) may contain (6 and 1) or (6 and 1 and 87) or ... – bzak Jul 12 '14 at 11:49
  • and how are the vectors `V1,V2,..` generated exactly? Also must a column `A(:,i)` use at least one value from *every* `Vk`? It would be better to edit the question and add all these details.. – Amro Jul 12 '14 at 11:50
  • @Amro: V1, ..., V20 are predefined vectors with known elements – bzak Jul 12 '14 at 11:55
  • ok. how about you generate `A` just like before, and then simply overwrite the first value of each column by choosing one value at random from a `Vk` to ensure at least one element is picked from those 20 vectors. That would satisfy the constraints, right? – Amro Jul 12 '14 at 12:02
  • @Amro: yes, A(:,i) must use at least one value from every Vk. – bzak Jul 12 '14 at 12:03
  • @bzak the solution by Amro is really good and it takes very less time. I am removing my solution. – Naveen Jul 12 '14 at 12:13
  • @Amro: but that it will respect the probabilities and does not generate duplicate values? – bzak Jul 12 '14 at 12:13
  • @Naveen: time is not a problem for me, I can wait 4 min. The important thing is to have the right Result! – bzak Jul 12 '14 at 12:16
  • I'm afraid I'm still not getting it... Perhaps others will be able to provide you help. – Amro Jul 12 '14 at 12:22
  • @bzak then try my solution and see if you are able to get the expected results. – Naveen Jul 12 '14 at 12:25
  • @user1735003: Yes, this is what I want. – bzak Jul 12 '14 at 15:46
  • [This](http://stackoverflow.com/questions/13914066/generate-random-number-with-given-probability-matlab) may help. – Autonomous Jul 12 '14 at 19:41

1 Answers1

1

Here is one solution:

h=40;
w=10000;
output=zeros(h,w);
i=1;
while i<=w
temp=randsample(numel(prob),h,true,prob);
check=all(any(ismember(vec,temp)));
if check~=0
output(:,i)=temp;
i=i+1;
end
end

Unfortunately, this takes approximately 4 minutes to generate the matrix with the specified constraints. Any other solution which takes less time is welcome.

Naveen
  • 306
  • 1
  • 6
  • What is vec in your answer? – bzak Jul 12 '14 at 12:39
  • @bzak `vec` is the matrix containing the constraint vectors. I had taken it as a `4 X 20` matrix. – Naveen Jul 12 '14 at 13:54
  • this solution gives exactly what I want, but the program execution time rises exponentially when I increase the size of vec. For example: if vec is a 8 x 100 matrix and h=50, P.E.T=30 hours !!! – bzak Jul 15 '14 at 19:38
  • This algorithm is inaccurate. It does not generate solutions which reflect the probabilities passed in `prob` because the likelyhood of rejecting a solution is lower if it contains many rare(low `prob`) numbers. – Daniel Jul 15 '14 at 20:46
  • 1
    I agree that the algorithm is not inaccurate. I also think that it is nontrivial to improve it and in many applications this is a reasonable approach ... the user should be careful to understand the solution and decide for him or herself whether or not this "sample and reject if it doesn't meet constraints" idea works for their application. – wandering star Jul 15 '14 at 21:11