11

Well not really RANDBETWEEN(). I'm trying to create a UDF to return the index of a number within an array, where the larger the number the more likely it is to be chosen.

I know how to assign probabilities to random numbers in a worksheet (i.e. using MATCH() on the sum of the probabilities, there's plenty of stuff on SO explaining that), but I want a UDF because I'm passing a special input array into the function - not just a selected range.

My issue is, the weighting is off, numbers later on in the array are more likely to be returned than those earlier in the array and I can't see where in my code I've gone wrong. Here's the UDF so far:

Public Function PROBABLE(ParamArray inputArray() As Variant) As Long
'Takes a set of relative or absolute probabilities and ranks a random number within them
Application.Volatile (True)
Dim outputArray() As Variant
Dim scalar As Single
Dim rankNum As Single
Dim runningTot As Single

'''''
'Here I take inputArray() and convert to outputArray(), 
'which is fed into the probability code below
'''''

scalar = 1 / WorksheetFunction.Sum(outputArray)
rankNum = Rnd()
runningTot = 0

For i = 0 To UBound(outputArray)
    runningTot = runningTot + outputArray(i)
    If runningTot * scalar >= rankNum Then
        PROBABLE = i + 1
        Exit Function
    End If
Next i

End Function

The function should look at the relative sizes of the numbers in outputArray() and pick randomly but weighted towards the larger numbers. E.g. outputArray() of {1,0,0,1} should assign probabilities respectively of {50%,0%,0%,50%} However when I tested that outputArray(), for 1000 samples and 100 iterations, and graphed how frequently item 1 or item 4 in the array was returned, I got this result:Graph

Approximately 20%:80% distribution. Plotting {1,1,1,1} (all should have equal chance) gave a 10%:20%:30%:40% distribution

I know I'm missing something obvious but I can't tell what, any help?

UPDATE

Some people were asking for the complete code, here it is.

Public Function PROBABLE(ParamArray inputArray() As Variant) As Long
'Takes a set of relative or absolute probabilities and ranks a random number within them
Application.Volatile (True) 'added some dimensions up here
Dim outputArray() As Variant
Dim inElement As Variant
Dim subcell As Variant
Dim scalar As Single
Dim rankNum As Single
Dim runningTot As Single
'convert ranges to values
'creating a new array from the mixture of ranges and values in the input array
''''
'This is where I create outputArray() from inputArray()
''''
ReDim outputArray(0)
For Each inElement In inputArray
'Normal values get copied from the input UDF to an output array, ranges get split up then appended
    If TypeName(inElement) = "Range" Or TypeName(inElement) = "Variant()" Then
        For Each subcell In inElement
            outputArray(UBound(outputArray)) = subcell
            ReDim Preserve outputArray(UBound(outputArray) + 1)
        Next subcell
    'Stick the element on the end of an output array
    Else
        outputArray(UBound(outputArray)) = inElement
        ReDim Preserve outputArray(UBound(outputArray) + 1)
    End If
Next inElement
ReDim Preserve outputArray(UBound(outputArray) - 1)
''''
'End of new code, the rest is as before
''''
scalar = 1 / WorksheetFunction.Sum(outputArray)
rankNum = Rnd()
runningTot = 0

For i = 0 To UBound(outputArray)
    runningTot = runningTot + outputArray(i)
    If runningTot * scalar >= rankNum Then
        PROBABLE = i + 1
        Exit Function
    End If
Next i

End Function

The start inputArray() outputArray() section is used to standardise different input methods. I.e. the user can enter a mixture of values, cell references/ranges and arrays, and the function can cope. e.g. {=PROBABLE(A1,5,B1:C15,IF(ISTEXT(D1:D3),LEN(D1:D3),0))} (you get the picture) should work just as well as =PROBABLE(A1:A3). I cycle through the sub-elements of the inputArray() and put them in my outputArray(). I'm fairly certain there's nothing wrong with this portion of code.

Then to get my results, I copied the UDF into A1:A1000, used a COUNTIF(A1:A1000,1)or instead of count 1, I did count 2, 3, 4 etc for each of the possible UDF outputs and made a short macro to recalculate the sheet 100 times, each time copying the result of the countif into a table to graph. I can't say precisely how I did that because I left this all at work, but I'll update on Monday.

Greedo
  • 4,967
  • 2
  • 30
  • 78
  • You should put Randomize before the Rnd() line. – Doug Coats Feb 10 '17 at 13:30
  • Funny, seems to work for me. How (and why) are you converting input into output? – SJR Feb 10 '17 at 13:33
  • 2
    Show the code converting `inputArray()` to `outputArray()`, and also describe the way you tested the frequency of the returned items. That may cause such distribution. IMO the present code should work without issues. – omegastripes Feb 10 '17 at 13:43
  • 1
    Just ran you experiment, adding only a single statement: `outputArray = inputArray`. Results are: **(50118, 0, 0, 49882)**. I think you should check your experiment and also the way you copy the arrays. It's a single statement. – A.S.H Feb 10 '17 at 14:19
  • Results for (1,1,1,1): **(25024, 25138, 24898, 24940)** – A.S.H Feb 10 '17 at 14:30
  • @DougCoats Can I ask why? What does it do? I am relatively new to all this! – Greedo Feb 10 '17 at 22:10
  • @Greedo the `rnd` function in VBA is not truly random, and it's usually recommended to use `Randomize` before calls to the `rnd` function. https://msdn.microsoft.com/en-us/library/8zedbtdt(v=vs.90).aspx – David Zemens Feb 10 '17 at 22:13
  • @David Zemens -- In this environment, no random number generated by a computer is going to be "truly random", it's just a point about terminology but they are "pseudo-random". `Randomize` only changes the seed of the random numbers (I.e. creates a different sequence of random numbers). This wouldn't have any effect on the statistical random properties of the generator. That said, I don't disagree that the VBA `Rnd` function is a pretty weak PRNG. The solution to that is to get a different PRNG - some good VBA functions out there that do the trick well. – CallumDA Feb 10 '17 at 22:47
  • @CallumDA yes, noted of course there is nothing really random about `Rnd` in VBA, even with `Randomize`. I didn't make that very clear in my comment above , only it was to answer the question "why" people use `Randomize`, and that is because it's somewhat of an improvement over the miserable `Rnd` function. Cheers – David Zemens Feb 10 '17 at 23:07

3 Answers3

4

Try this:

Function Probable(v As Variant) As Long
    Application.Volatile 'remove this if you don't want a volatile function

    Dim v2 As Variant
    ReDim v2(LBound(v) To UBound(v) + 1)

    v2(LBound(v2)) = 0
    Dim i As Integer
    For i = LBound(v) To UBound(v)
        v2(i + 1) = v2(i) + v(i) / Application.Sum(v)
    Next i

    Probable = Application.WorksheetFunction.Match(Rnd(), v2, 1)
End Function

The array v is essentially your outputArray.

The code takes an array like {1,0,0,1} and converts it to {0,0.5,0.5,1} (note the 0 at the beginning) at which point you can do a MATCH as you suggested to get to get either a 1 or 4 with equal probability.

Similarly, if you were to start with {1,1,1,1} it would be converted to {0,0.25,0.5,0.75,1} and return any of 1, 2, 3 or 4 with equal probability.

Also note: you could probably make it a bit quicker if you save the value of Application.Sum(v) in a variable rather than performing the calculation for every value in array v.

Update
The function now takesv as a parameter -- like your code. I also tweaked it a bit so that it can deal with v having any base, which means you can run it from the worksheet too: =Probable({1,0,0,1}) for example

CallumDA
  • 12,025
  • 6
  • 30
  • 52
2

This is something I have built, following your logic. It works quite ok, providing different results.

Option Explicit
Public Function TryMyRandom() As String

    Dim lngTotalChances         As Long
    Dim i                       As Long
    Dim previousValue           As Long
    Dim rnd                     As Long
    Dim result                  As Variant

    Dim varLngInputArray        As Variant
    Dim varLngInputChances      As Variant
    Dim varLngChancesReedit     As Variant

    varLngInputChances = Array(1, 2, 3, 4, 5)
    varLngInputArray = Array("a", "b", "c", "d", "e")
    lngTotalChances = Application.WorksheetFunction.Sum(varLngInputChances)
    rnd = Application.WorksheetFunction.RandBetween(1, lngTotalChances)

    ReDim varLngChancesReedit(UBound(varLngInputChances))

    For i = LBound(varLngInputChances) To UBound(varLngInputChances)
        varLngChancesReedit(i) = varLngInputChances(i) + previousValue
        previousValue = varLngChancesReedit(i)

        If rnd <= varLngChancesReedit(i) Then
            result = varLngInputArray(i)
            Exit For
        End If
    Next i

    TryMyRandom = result

End Function

Public Sub TestMe()

    Dim lng     As Long
    Dim i       As Long
    Dim dict    As Object
    Dim key     As Variant
    Dim res     As String

    Set dict = CreateObject("Scripting.Dictionary")

    For lng = 1 To 1000

        res = TryMyRandom
        If dict.Exists(res) Then
            dict(res) = dict(res) + 1
        Else
            dict(res) = 1
        End If


    Next lng

    For Each key In dict.Keys
        Debug.Print key & " ===> " & dict(key)
    Next


End Sub

Concerning your case, make sure that the array is sorted. E.g., in my case speaking about varLngInputChances. I have not taken a look at the corner cases, there can be an error there, possibly.

Run the TestMe sub. It will generate even a summary of the results. If you change the variations to varLngInputChances = Array(1, 1, 0, 0, 1), it gives:

a ===> 329 b ===> 351 e ===> 320

which is quite good random :) You can change the number of the sample here: For lng = 1 To 1000, it works quite fast. I have just tried it with 100,000 tests.

Vityata
  • 42,633
  • 8
  • 55
  • 100
2

It appears I have made a tragic mistake. My code was fine, my counting wasn't so good. I was using SUMIF() instead of COUNTIF() in my graphing, resulting in later objects in the array (with a higher Index - the output of the UDF which I was supposed to be counting but was instead summing) getting a weighting proportional to their position.

In retrospect, I think someone far more clever than I could probably have deduced that from the information given. I said {1,1,1,1} has a {10%:20%:30%:40%},that's a {1:2:3:4} ratio, which is precisely the same ratio as the indices of the outputs, deduction: the outputs were summed not counted.

Similarly, the graph of {1,0,0,1} with a {20%:0%:0%:80%} output, well divide each percentage by it's index (20%/1, 80%/4) and Hey Presto {20%:0%:0%:20%}, or the 1:1 ratio I had expected.

Something annoying but satisfying in that - knowing the answer was there all along. I suppose there's probably a moral in all this. At least the post can serve as a warning to budding VBAers to check their arithmetic.

Greedo
  • 4,967
  • 2
  • 30
  • 78