(Note: My apologies in advance for the lengthy answer. My actual proposed solution is not all that long, but there are a number of problems with the proposed solutions so far and I want to try to address those thoroughly, to provide context for my own proposed solution).
In my opinion, while you have in fact accepted one answer and might be tempted to use either one, neither of the answers provided so far are correct or useful.
Commenter Ben Voigt has already pointed out two major flaws with your specifications as stated, both related to the fact that you are encoding the enum value's weight in the value itself:
- You are tying the enum's underlying type to the code that then must interpret that type.
- Two enum values that have the same weight are indistinguishable from each other.
Both of these issues can be addressed. Indeed, while the answer you accepted (why?) fails to address the first issue, the one provided by Dweeberly does address this through the use of Convert.ToInt32()
(which can convert from long
to int
just fine, as long as the values are small enough).
But the second issue is much harder to address. The answer from Asad attempts to address this by starting with the enum names and parsing them to their values. And this does indeed result in the final array being indexed containing the corresponding entries for each name separately. But the code actually using the enum has no way to distinguish the two; it's really as if those two names are a single enum value, and that single enum value's probability weight is the sum of the value used for the two different names.
I.e. in your example, while the enum entries for e.g. BNeg
and ABNeg
will be selected separately, the code that receives these randomly selected value has no way to know whether it was BNeg
or ABNeg
that was selected. As far as it knows, those are just two different names for the same value.
Now, even this problem can be addressed (but not in the way that Asad attempts to…his answer is still broken). If you were, for example, to encode the probabilities in the value while still ensuring unique values for each name, you could decode those probabilities while doing the random selection and that would work. For example:
enum BloodType
{
// BloodType = Probability
ONeg = 4 * 100 + 0,
OPos = 36 * 100 + 1,
ANeg = 3 * 100 + 2,
APos = 28 * 100 + 3,
BNeg = 1 * 100 + 4,
BPos = 20 * 100 + 5,
ABNeg = 1 * 100 + 6,
ABPos = 5 * 100 + 7,
};
Having declared your enum values that way, then you can in your selection code divide the enum value by 100 to obtain its probability weight, which then can be used as seen in the various examples. At the same time, each enum name has a unique value.
But even solving that problem, you are still left with problems related to the choice of encoding and representation of the probabilities. For example, in the above you cannot have an enum that has more than 100 values, nor one with weights larger than (2^31 - 1) / 100; if you want an enum that has more than 100 values, you need a larger multiplier but that would limit your weight values even more.
In many scenarios (maybe all the ones you care about) this won't be an issue. The numbers are small enough that they all fit. But that seems like a serious limitation in what seems like a situation where you want a solution that is as general as possible.
And that's not all. Even if the encoding stays within reasonable limits, you have another significant limit to deal with: the random selection process requires an array large enough to contain for each enum value as many instances of that value as its weight. Again, if the values are small maybe this is not a big problem. But it does severely limit the ability of your implementation to generalize.
So, what to do?
I understand the temptation to try to keep each enum type self-contained; there are some obvious advantages to doing so. But there are also some serious disadvantages that result from that, and if you truly ever try to use this in a generalized way, the changes to the solutions proposed so far will tie your code together in ways that IMHO negate most if not all of the advantage of keeping the enum types self-contained (primarily: if you find you need to modify the implementation to accommodate some new enum type, you will have to go back and edit all of the other enum types you're using…i.e. while each type looks self-contained, in reality they are all tightly coupled with each other).
In my opinion, a much better approach would be to abandon the idea that the enum type itself will encode the probability weights. Just accept that this will be declared separately somehow.
Also, IMHO is would be better to avoid the memory-intensive approach proposed in your original question and mirrored in the other two answers. Yes, this is fine for the small values you're dealing with here. But it's an unnecessary limitation, making only one small part of the logic simpler while complicating and restricting it in other ways.
I propose the following solution, in which the enum values can be whatever you want, the enum's underlying type can be whatever you want, and the algorithm uses memory proportionally only to the number of unique enum values, rather than in proportion to the sum of all of the probability weights.
In this solution, I also address possible performance concerns, by caching the invariant data structures used to select the random values. This may or may not be useful in your case, depending on how frequently you will be generating these random values. But IMHO it is a good idea regardless; the up-front cost of generating these data structures is so high that if the values are selected with any regularity at all, it will begin to dominate the run-time cost of your code. Even if it works fine today, why take the risk? (Again, especially given that you seem to want a generalized solution).
Here is the basic solution:
static T NextRandomEnumValue<T>()
{
KeyValuePair<T, int>[] aggregatedWeights = GetWeightsForEnum<T>();
int weightedValue =
_random.Next(aggregatedWeights[aggregatedWeights.Length - 1].Value),
index = Array.BinarySearch(aggregatedWeights,
new KeyValuePair<T, int>(default(T), weightedValue),
KvpValueComparer<T, int>.Instance);
return aggregatedWeights[index < 0 ? ~index : index + 1].Key;
}
static KeyValuePair<T, int>[] GetWeightsForEnum<T>()
{
object temp;
if (_typeToAggregatedWeights.TryGetValue(typeof(T), out temp))
{
return (KeyValuePair<T, int>[])temp;
}
if (!_typeToWeightMap.TryGetValue(typeof(T), out temp))
{
throw new ArgumentException("Unsupported enum type");
}
KeyValuePair<T, int>[] weightMap = (KeyValuePair<T, int>[])temp;
KeyValuePair<T, int>[] aggregatedWeights =
new KeyValuePair<T, int>[weightMap.Length];
int sum = 0;
for (int i = 0; i < weightMap.Length; i++)
{
sum += weightMap[i].Value;
aggregatedWeights[i] = new KeyValuePair<T,int>(weightMap[i].Key, sum);
}
_typeToAggregatedWeights[typeof(T)] = aggregatedWeights;
return aggregatedWeights;
}
readonly static Random _random = new Random();
// Helper method to reduce verbosity in the enum-to-weight array declarations
static KeyValuePair<T1, T2> CreateKvp<T1, T2>(T1 t1, T2 t2)
{
return new KeyValuePair<T1, T2>(t1, t2);
}
readonly static KeyValuePair<BloodType, int>[] _bloodTypeToWeight =
{
CreateKvp(BloodType.ONeg, 4),
CreateKvp(BloodType.OPos, 36),
CreateKvp(BloodType.ANeg, 3),
CreateKvp(BloodType.APos, 28),
CreateKvp(BloodType.BNeg, 1),
CreateKvp(BloodType.BPos, 20),
CreateKvp(BloodType.ABNeg, 1),
CreateKvp(BloodType.ABPos, 5),
};
readonly static Dictionary<Type, object> _typeToWeightMap =
new Dictionary<Type, object>()
{
{ typeof(BloodType), _bloodTypeToWeight },
};
readonly static Dictionary<Type, object> _typeToAggregatedWeights =
new Dictionary<Type, object>();
Note that the work of actually selecting a random value is simply a matter of choosing a non-negative random integer less than the sum of the weights, and then using a binary search to find the appropriate enum value.
Once per enum type, the code will build the table of values and weight-sums that will be used for the binary search. This result is stored in a cache dictionary, _typeToAggregatedWeights
.
There are also the objects that have to be declared and which will be used at run-time to build this table. Note that the _typeToWeightMap
is just in support of making this method 100% generic. If you wanted to write a different named method for each specific type you wanted to support, that could still used a single generic method to implement the initialization and selection, but the named method would know the correct object (e.g. _bloodTypeToWeight
) to use for initialization.
Alternatively, another way to avoid the _typeToWeightMap
while still keeping the method 100% generic would be to have the _typeToAggregatedWeights
be of type Dictionary<Type, Lazy<object>>
, and have the values of the dictionary (the Lazy<object>
objects) explicitly reference the appropriate weight array for the type.
In other words, there are lots of variations on this theme that would work fine. But they will all have essentially the same structure as above; semantics would be the same and performance differences would be negligible.
One thing you'll notice is that the binary search requires a custom IComparer<T>
implementation. That is here:
class KvpValueComparer<TKey, TValue> :
IComparer<KeyValuePair<TKey, TValue>> where TValue : IComparable<TValue>
{
public readonly static KvpValueComparer<TKey, TValue> Instance =
new KvpValueComparer<TKey, TValue>();
private KvpValueComparer() { }
public int Compare(KeyValuePair<TKey, TValue> x, KeyValuePair<TKey, TValue> y)
{
return x.Value.CompareTo(y.Value);
}
}
This allows the Array.BinarySearch()
method to correct compare the array elements, allowing a single array to contain both the enum values and their aggregated weights, but limiting the binary search comparison to just the weights.