-3

I need to simulate an information source with alphabet "a,b,c,d" with the respective probabilities of 0.1, 0.5, 0.2, 0.2. I do not know how to do it using MATLAB. Help is most appreciated.

MZimmerman6
  • 8,445
  • 10
  • 40
  • 70
  • 1
    The [`rand`](http://www.mathworks.com/help/matlab/ref/rand.html) function might be useful. Your probabilities sum to one. But you're going to have to show us what you have tried so far. – horchler Nov 21 '13 at 18:10
  • I should have an array with 0.5 probability b, 0.1.. a, 0.2.. c, and 0.2 probability d. Assume we have an array length of 100. I need to have a set containing 50 b, 20 c, 20 d, 10 a. But orders of the letters mixed randomly. – user3018734 Nov 21 '13 at 18:21
  • @user3018734: Now you are asking for something different, it is unlikely that you match exactly the expected value. – Daniel Nov 21 '13 at 18:30
  • See https://stackoverflow.com/questions/13914066/generate-random-number-with-given-probability-matlab and https://stackoverflow.com/questions/58607156/draw-random-numbers-from-pre-specified-probability-mass-function-in-matlab – Cris Luengo Aug 10 '22 at 14:14

2 Answers2

1

you could do something as simple as follows. Simply create a large random vector using rand, this will create values between 0 and 1 with a uniform probability. So if you want a number to have a 10 percent chance of occurring you give it a range of 0.1, typically 0 to 0.1. You can then add more ranges to these same numbers to get what you want.

vals =rand(1,10000);
letters = cell(size(vals));
[letters{vals<0.1}] = deal ('a');
[letters{vals > 0.1 & vals <= 0.6}] = deal ('b');
[letters{vals > 0.6 & vals <= 0.8}] = deal ('c');
[letters{vals > 0.8 & vals <= 1}] = deal ('d');

The above code will return a 10000 character letter array with the described percentages.

Or you can do this dynamically as follows:

vals =rand(1,10000);
output= cell(size(vals));
letters2use = {'a','b','c','d'};
percentages = [0.1,0.5,0.2,0.2];
lowerBounds = [0,cumsum(percentages(1:end-1))];
upperBounds = cumsum(percentages);
for i = 1:numel(percentages)
    [output{vals > lowerBounds(i) & vals <= upperBounds(i)}] = deal(letters2use{i}) ;
end

UPDATE

The above code has no guarantee of a certain number of occurrences of each letter, however the following does. Since from your comment it seems you need exactly a certain number of each the following code should do that by randomly assigning letters around

numElements = 10000;
letters2use = {'a','b','c','d'};
percentages = [0.1,0.5,0.2,0.2];
numEach = round(percentages*numElements);
while sum(numEach) < numElements
   [~,idx] = max(mod(percentages*numElements,1));
   numEach(idx) = numEach(idx) + 1;
end
while sum(numEach) > numElements
   [~,idx] = min(mod(percentages*numElements,1));
   numEach(idx) = numEach(idx) - 1;
end
indices = randperm(numElements);
output = cell(size(indices));
lower = [0,cumsum(numEach(1:end-1))]+1;
upper = cumsum(numEach);
for i = 1:numel(lower)
    [output{indices(lower(i):upper(i))}] = deal(letters2use{i});
end
output
MZimmerman6
  • 8,445
  • 10
  • 40
  • 70
1

You could first create an array containing the relative numbers of each character defined by their relative probabilities.

First set the max # of samples for any letter; doesn't have to be the same as the # of rand samples (later below):

maxSamplesEach = 100; 

Define the data for the problem:

strings = ['a' 'b' 'c' 'd'];
probabilty = [0.1 0.5 0.2 0.2];

Construct a sample space weighted by relative probabilities:

count = 0;
for k = 1:size(strings,2)
    for i = 1:probabilty(k)*maxSamplesEach
        count = count+1;
        totalSampleSpace(count) = strings(k);
    end 
end

Now define a range for the random numbers:

min = 1; 
max = count;

Now generate a 100 random numbers from a uniform distribution from the range defined above:

N = 100;
randomSelections = round(min + (max-min).*rand(1,N));

Now here are your random samples taken from the distribution:

randomSamples = totalSampleSpace(randomSelections);

Next just count them up:

for k = 1:size(strings,2)
    indices = [];
    indices = find(randomSamples == strings(k));
    disp(['Count samples for ', strings(k),' = ', num2str(size(indices,2))]);
end

Keep in mind that these results are statistical in nature so its highly unlikely that you will get the same relative contributions each time.

Example output:

Count samples for a = 11
Count samples for b = 49
Count samples for c = 19
Count samples for d = 21
Bruce Dean
  • 2,798
  • 2
  • 18
  • 30