2

I am doing an exercise from exercism.io, in which I have to generate random names for robots. I am able to get through a bulk of the tests until I hit this test:

[Fact]
public void Robot_names_are_unique()
{
    var names = new HashSet<string>();
    for (int i = 0; i < 10_000; i++) {
        var robot = new Robot();
        Assert.True(names.Add(robot.Name));
    }
}

After some googling around, I stumbled upon a couple of solutions and found out about the Fisher-Yates algorithm. I tried to implement it into my own solution but unfortunately, I haven't been able to pass the final test, and I'm stumped. If anyone could point me in the right direction with this, I'd greatly appreciate it. My code is below:

EDIT: I forgot to mention that the format of the string has to follow this: @"^[A-Z]{2}\d{3}$"

public class Robot
{
string _name;
Random r = new Random();
string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string nums = "0123456789";

public Robot()
{
    _name = letter() + num();
}

public string Name
{
    get { return _name; }
}

private string letter() => GetString(2 ,alpha.ToCharArray(), r);

private string num() => GetString(3, nums.ToCharArray(), r);

public void Reset() => _name = letter() + num();

public string GetString(int length,char[] chars, Random rnd)
{
    Shuffle(chars, rnd);
    return new string(chars, 0, length);
}

public void Shuffle(char[] _alpha, Random r)
{


    for(int i = _alpha.Length - 1; i > 1; i--)
    {
        int j = r.Next(i);
        char temp = _alpha[i];
        _alpha[i] = _alpha[j];
        _alpha[j] = temp;
    }

}

}
bman
  • 23
  • 3
  • 1
    You did not specify the error in the code you posted. Indeed that code does not look like it would even compile in teh first place: `i < 10_000` ? | If you just generate enough random values, you will always evetually get a colission. Even with something like a UUID a colission is not impossible - it is just really, really unlikely: https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions – Christopher Feb 05 '20 at 18:03
  • 3
    Random does not mean unique. String of 2 or 3 characters are bound to duplicate sooner rather than later – Ňɏssa Pøngjǣrdenlarp Feb 05 '20 at 18:08
  • 2
    @Christopher: You can now use an underscore as a separator in C# numeric literals. a literal of `10_000` is the same as `10000`: https://www.davidhayden.me/blog/csharp-7-underscore-as-digit-separator-in-numeric-literals – Flydog57 Feb 05 '20 at 18:18
  • @Flydog57 Thanks for the information. Looks like SO's Syntax Highlighting does not know that anymore then I did. – Christopher Feb 05 '20 at 18:22
  • 2
    The test fails because your name generator isn't nearly sufficiently random, classic [birthday problem](https://en.wikipedia.org/wiki/Birthday_problem). 2 chars and 3 digits gives you 46800 unique names - the probability of there _not_ being a collision out of 10000 random samples is near 0. – Mathias R. Jessen Feb 05 '20 at 18:22
  • @MathiasR.Jessen It took me a while to notice it, but he generates one Random instance per loop. Inside the Robot instance. So it was the even worse mistake with Random. – Christopher Feb 05 '20 at 18:28
  • 1
    Good spot, but fixing it won't help - the chance of passing this test is infinitesimally small with the current logic. – Mathias R. Jessen Feb 05 '20 at 18:36
  • What are the other tests? Because the `Robot_names_are_unique` test would be trivial to pass by simply generating names sequentially. – Matt Burland Feb 07 '20 at 19:06
  • 1
    And Fisher-Yates won't help you here because for every robot you are shuffling a fresh "deck". The only way Fisher-Yates would work here to give you both a) Random and b) non-colliding names would be to first generate the list of _all possible names_ (or, at least, the first 10,000), then shuffle _that_ list with Fisher-Yates and then for each robot, just draw the first name from your list of shuffled candidates and then remove it from the list. Just like a deck of cards. – Matt Burland Feb 07 '20 at 19:09

4 Answers4

2

The first rule of any ID is:

It does not mater how big it is, how many possible value it has - if you just create enough of them, you will get a colission eventually.

To Quote Trillian from the Hithchikers Guide: "[A colission] is not impossible. Just realy, really unlikely."

However in this case, I think it is you creating Random Instances in a Loop. This is a classical beginners mistake when workign with Random. You should not create a new random isntance for each Robot Instance, you should have one for the application that you re-use. Like all Pseudorandom Number Generators, Random is deterministic. Same inputs - same outputs.

As you did not specify a seed value, it will use the time in milliseconds. Wich is going to the same between the first 20+ loop itterations at last. So it is going to have the same seed and the same inputs, so the same outputs.

Christopher
  • 9,634
  • 2
  • 17
  • 31
  • Collisions won't happen if you use sequential ids and draw less than whatever the maximum value for the data type you are using. Of course, ensuring your ids are sequential can be difficult if you need to process requests non-sequentially. – Matt Burland Feb 07 '20 at 19:12
  • @MattBurland Absolute Correct. But he did say he wanted it randomized, and with those a colission is way more likely. – Christopher Feb 07 '20 at 20:18
2

The easiest solution for unique names is to use GUIDs. In theory, it is possible to generate non-unique GUIDs but it is pretty close to zero.

Here is the sample code:

var newUniqueName = Guid.NewGuid().ToString();

Sure GUIDs do not look pretty but they are really easy to use.

EDIT: Since the I missed the additional requirement for the format I see that GUID format is not acceptable.

Here is an easy way to do that too. Since format is two letters (26^2 possibile values) and 3 digits (10^3 possible values) the final number of possible values is 26^2 * 10^3 = 676 * 1000 = 676000. This number is quite small so Random can be used to generate the random integer in the range 0-675999 and then that number can be converted to the name. Here is the sample code:

            var random = new System.Random();
            var value = random.Next(676000);
            var name = ((char)('A' + (value % 26))).ToString();
            value /= 26;
            name += (char)('A' + (value % 26));
            value /= 26;
            name += (char)('0' + (value % 10));
            value /= 10;
            name += (char)('0' + (value % 10));
            value /= 10;
            name += (char)('0' + (value % 10));

The usual disclaimer about possible identical names applies here too since we have 676000 possible variants and 10000 required names.

EDIT2: Tried the code above and generating 10000 names using random numbers produced between 9915 and 9950 unique names. That is no good. I would use a simple static in class member as a counter instead of random number generator.

Optional Option
  • 1,521
  • 13
  • 33
1

First, let's review the test you're code is failing against:

  • 10.000 instances created
  • Must all have distinct names

So somehow, when creating 10000 "random" names, your code produces at least two names that are the same.

Now, let's have a look at the naming scheme you're using:

AB123

The maximum number of unique names we could possibly create is 468000 (26 * 25 * 10 * 9 * 8).

This seems like it should not be a problem, because 10000 < 468000 - but this is where the birthday paradox comes in!

From wikipedia:

In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday.

Rewritten for the purposes of your problem, we end up asking:

What's the probability that, in a set of 10000 randomly chosen people, some pair of them will have the same name.

The wikipedia article also lists a function for approximating the number of people required to reach a 50% propbability that two people will have the same name:

BirthdayProblemApproxN

where m is the total number of possible distinct values. Applying this with m=468000 gives us ~806 - meaning that after creating only 806 randomly named Robots, there's already a 50% chance of two of them having the same name.

By the time you reach Robot #10000, the chances of not having generated two names that are the same is basically 0.

As others have noted, you can solve this by using a Guid as the robot name instead.

If you want to retain the naming convention you might also get around this by implementing an LCG with an appropriate period and use that as a less collision-prone "naming generator".

Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • 1
    `The maximum number of unique names we could possibly create is 46.800 (26 * 25 * 10 * 9 * 8).` There is no requirement that possible values in each place can't be repeated, so it's actually 676,000 possible values and not 468,000 (which you also shifted a point too). So you actually need almost 1,000 samples to reach 50% probability. But your general point still stands. – Matt Burland Feb 10 '20 at 18:38
  • @MattBurland Thanks, fixed the one(-factor)-off problem :) If you look at OP's `GetString()` implementation it should be clear that no char's will ever be repeated – Mathias R. Jessen Feb 11 '20 at 00:48
0

Here's one way you can do it:

  1. Generate the list of all possible names
  2. For each robot, select a name from the list at random
  3. Remove the selected name from the list so it can't be selected again

With this, you don't even need to shuffle. Something like this (note, I stole Optional Option's method of generating names because it's quite clever and I couldn't be bother thinking of my own):

public class Robot
{
    private static List<string> names;
    private static Random rnd = new Random();
    public string Name { get; private set; }

    static Robot()
    {
        Console.WriteLine("Initializing");
        // Generate possible candidates
        names = Enumerable.Range(0, 675999).Select(i => 
        {
            var sb = new StringBuilder(5);
            sb.Append((char)('A' + i % 26));
            i /= 26;
            sb.Append((char)('A' + i % 26));
            i /= 26;
            sb.Append(i % 10);
            i /= 10;
            sb.Append(i % 10);
            i /= 10;
            sb.Append(i % 10);
            return sb.ToString();
        }).ToList();
    }

    public Robot()
    {
        // Note: if this needs to be multithreaded, then you'd need to do some work here
        // to avoid two threads trying to take a name at the same time
        // Also note: you should probably check that names.Count > 0 
        // and throw an error if not
        var i = rnd.Next(0, names.Count - 1);
        Name = names[i];
        names.RemoveAt(i);
    }
}

Here's a fiddle that generates 20 random names. They can only be unique because they are removed once they are used.

The point about multitheading is very important however. If you needed to be able to generate robots in parallel, then you'd need to add some code (e.g. locking the critical section of code) to ensure that only one name is being picked and removed from the list of candidates at a time or else things will get really bad, really quickly. This is why, when people need a random id with a reasonable expectation that it'll be unique, without worrying that some other thread(s) are trying the same thing at the same time, they use GUIDs. The sheer number of possible GUIDs makes collisions very unlikely. But you don't have that luxury with only 676,000 possible values

Matt Burland
  • 44,552
  • 18
  • 99
  • 171