I need to simulate an information source with alphabet "a,b,c,d" with the respective probabilities of 0.1, 0.5, 0.2, 0.2. I do not know how to do it using MATLAB. Help is most appreciated.
-
1The [`rand`](http://www.mathworks.com/help/matlab/ref/rand.html) function might be useful. Your probabilities sum to one. But you're going to have to show us what you have tried so far. – horchler Nov 21 '13 at 18:10
-
I should have an array with 0.5 probability b, 0.1.. a, 0.2.. c, and 0.2 probability d. Assume we have an array length of 100. I need to have a set containing 50 b, 20 c, 20 d, 10 a. But orders of the letters mixed randomly. – user3018734 Nov 21 '13 at 18:21
-
@user3018734: Now you are asking for something different, it is unlikely that you match exactly the expected value. – Daniel Nov 21 '13 at 18:30
-
See https://stackoverflow.com/questions/13914066/generate-random-number-with-given-probability-matlab and https://stackoverflow.com/questions/58607156/draw-random-numbers-from-pre-specified-probability-mass-function-in-matlab – Cris Luengo Aug 10 '22 at 14:14
2 Answers
you could do something as simple as follows. Simply create a large random vector using rand
, this will create values between 0 and 1 with a uniform probability. So if you want a number to have a 10 percent chance of occurring you give it a range of 0.1, typically 0 to 0.1. You can then add more ranges to these same numbers to get what you want.
vals =rand(1,10000);
letters = cell(size(vals));
[letters{vals<0.1}] = deal ('a');
[letters{vals > 0.1 & vals <= 0.6}] = deal ('b');
[letters{vals > 0.6 & vals <= 0.8}] = deal ('c');
[letters{vals > 0.8 & vals <= 1}] = deal ('d');
The above code will return a 10000 character letter array with the described percentages.
Or you can do this dynamically as follows:
vals =rand(1,10000);
output= cell(size(vals));
letters2use = {'a','b','c','d'};
percentages = [0.1,0.5,0.2,0.2];
lowerBounds = [0,cumsum(percentages(1:end-1))];
upperBounds = cumsum(percentages);
for i = 1:numel(percentages)
[output{vals > lowerBounds(i) & vals <= upperBounds(i)}] = deal(letters2use{i}) ;
end
UPDATE
The above code has no guarantee of a certain number of occurrences of each letter, however the following does. Since from your comment it seems you need exactly a certain number of each the following code should do that by randomly assigning letters around
numElements = 10000;
letters2use = {'a','b','c','d'};
percentages = [0.1,0.5,0.2,0.2];
numEach = round(percentages*numElements);
while sum(numEach) < numElements
[~,idx] = max(mod(percentages*numElements,1));
numEach(idx) = numEach(idx) + 1;
end
while sum(numEach) > numElements
[~,idx] = min(mod(percentages*numElements,1));
numEach(idx) = numEach(idx) - 1;
end
indices = randperm(numElements);
output = cell(size(indices));
lower = [0,cumsum(numEach(1:end-1))]+1;
upper = cumsum(numEach);
for i = 1:numel(lower)
[output{indices(lower(i):upper(i))}] = deal(letters2use{i});
end
output

- 8,445
- 10
- 40
- 70
You could first create an array containing the relative numbers of each character defined by their relative probabilities.
First set the max # of samples for any letter; doesn't have to be the same as the # of rand samples (later below):
maxSamplesEach = 100;
Define the data for the problem:
strings = ['a' 'b' 'c' 'd'];
probabilty = [0.1 0.5 0.2 0.2];
Construct a sample space weighted by relative probabilities:
count = 0;
for k = 1:size(strings,2)
for i = 1:probabilty(k)*maxSamplesEach
count = count+1;
totalSampleSpace(count) = strings(k);
end
end
Now define a range for the random numbers:
min = 1;
max = count;
Now generate a 100 random numbers from a uniform distribution from the range defined above:
N = 100;
randomSelections = round(min + (max-min).*rand(1,N));
Now here are your random samples taken from the distribution:
randomSamples = totalSampleSpace(randomSelections);
Next just count them up:
for k = 1:size(strings,2)
indices = [];
indices = find(randomSamples == strings(k));
disp(['Count samples for ', strings(k),' = ', num2str(size(indices,2))]);
end
Keep in mind that these results are statistical in nature so its highly unlikely that you will get the same relative contributions each time.
Example output:
Count samples for a = 11
Count samples for b = 49
Count samples for c = 19
Count samples for d = 21

- 2,798
- 2
- 18
- 30