3

For my ruby testsuite, I need predictable UUIDs. I am aware that UUIDs are by nature random and non-deterministic, and that this is good. But in the testsuite, it would be useful to have UUIDs that can be re-used through fixtures, data-helpers, seeds etc.

I now have a naive implementation that easily leads to invalid UUIDs:

def fake_uuid(character = "x")
  [8, 4, 4, 4, 12].map { |length| character * length }.join("-")
end

fake_uuid('a') => "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" # This is valid
fake_uuid('z') => "zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz" # This is invalid, not hex.

I could, obviously add checks that only a-f,0-9 are allowed as input. An alternative would be to harcode a pre-generated list of UUIDs and pick one based on arguments.

But I'm wondering, is there not a better way? Would UUIDv5 work for this? Is there a way to call SecureRandom.uuid to have it return the same UUID (for a thread or session)? Does it need an additional gem? Or is my approach the closest one can get?

Having it made up of all the same characters is not a requirement.
Having it somewhat readable is a big pro, but not a requirement. This way, you can e.g. ensure that a Company has a UUID cccccccc-cccc-cccc-cccc-cccccccccccc and its Employee the UUID eeeeeeee-eeee-eeee-eeee-eeeeeeeeeeee.

berkes
  • 26,996
  • 27
  • 115
  • 206
  • Just an idea: We changed our APIs, so that for any POST you can send in a UUID for any object you create. This param is, of course, optional and supposedly the back-end framework already checks for duplicates. This allows your tests / fixtures to create predictable UUIDs. – SiKing Oct 28 '20 at 20:11
  • 2
    What's wrong with `00000000-0000-0000-0000-000000000001`, `00000000-0000-0000-0000-000000000002`, etc? Just increment. – anothermh Oct 28 '20 at 20:13
  • @anothermh: that's actually a really neat and simple idea. Sometimes you get so cought up in a solution that the simplest solution is invisible. – berkes Oct 28 '20 at 20:20
  • @SiKing: that is actually one of the most important use-cases why I need to create somewhat predictable UUIDs. Sure, my tests can send a `SecurRandom.uuid` along, but that makes the exceptions, failures and state hard to debug. – berkes Oct 28 '20 at 20:21
  • @berkes which testing framework are you using? – Thomas Koppensteiner Oct 28 '20 at 20:52
  • @anothermh Depending on what the back-end does with these, you might need some additional non-zero digits to make them look "valid". https://en.wikipedia.org/wiki/Universally_unique_identifier#Format – SiKing Oct 28 '20 at 20:55

4 Answers4

5

I am aware that UUIDs are by nature random and non-deterministic, and that this is good.

That assumption is wrong.

There are 5 versions of UUID:

  • Versions 1 and 2 are based on MAC address and date time, and thus deterministic in the sense that it would theoretically give the same UUID on the same computer at the same time.
  • Versions 3 and 5 are based on namespace and name, and thus fully deterministic.
  • Version 4 is random.

So, if you use Version 3 or Version 5 UUIDs, they will be fully deterministic.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
4

UUIDs use two digits to denote their format: (actually just some of the digit's bits)

xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
              ^    ^
        version    variant

The following pattern denotes version 4 (M=4), variant 1 (N=8) which simply means "random bytes":

xxxxxxxx-xxxx-4xxx-8xxx-xxxxxxxxxxxx

You could use it as a template to generate fake (but valid) UUIDs based on a sequence number: (as suggested in the comments)

def fake_uuid(n)
  '00000000-0000-4000-8000-%012x' % n
end

fake_uuid(1) #=> "00000000-0000-4000-8000-000000000001"
fake_uuid(2) #=> "00000000-0000-4000-8000-000000000002"
fake_uuid(3) #=> "00000000-0000-4000-8000-000000000003"

Having it somewhat readable is a big pro ...

There are plenty of unused fields / digits to add more data:

def fake_uuid(klass, n)
  k = { Company => 1, Employee => 2 }.fetch(klass, 0)

  '%08x-0000-4000-8000-%012x' % [k, n]
end

fake_uuid(Company, 1)   #=> "00000001-0000-4000-8000-000000000001"
fake_uuid(Company, 2)   #=> "00000001-0000-4000-8000-000000000002"

fake_uuid(Employee, 1)  #=> "00000002-0000-4000-8000-000000000001"
fake_uuid(Employee, 2)  #=> "00000002-0000-4000-8000-000000000002"

#                            ^^^^^^^^                ^^^^^^^^^^^^
#                              class                   sequence
Stefan
  • 109,145
  • 14
  • 143
  • 218
  • Thanks for the elaborate answer. I was not aware of the signal bits, but TIL. Really love the idea of mapping klass to a number, for better evaluation. Exactly what I was looking for! – berkes Oct 30 '20 at 14:54
1

Think Dependency Injection & Factories Whenever Possible

What you're trying to do seems like a testing anti-pattern. You could theoretically do what you want by using Version-1 UUIDs with a pre-defined MAC address, and a gem like timecop to create deterministic times, but this is probably unjustified for any real-world use case I can imagine.

Instead, you should use factories rather than fixtures for your tests, or create methods that allow for direct injection of your test input and/or output values. For example:

# some UUID-related method under test
def do_something_with(uuid=nil)
  # fetch the uuid the way you would if not injected
  uuid ||= gets.chomp
  uuid.tr '3', '4'
end
  
# write your tests to validate pre-defined input and
# output values
input_value  = '01957E2E-B3BA-4A46-BC4D-00615BE630E3'
output_value = '01957E2E-B4BA-4A46-BC4D-00615BE640E4'

# validate the expected transformation
do_something_with(input_value) == output_value

Whether you're doing this with a database, or with a testing DSL like RSpec, the results of the approach should be the same because you're defining both values. Since TDD/BDD shouldn't be testing core functionality, unless you're actually trying to test some custom UUID generator this approach should do it. If you are rolling your own generator, then you can still use the same approach to inject parameters like MAC address, date/time, or other factors used to generate your deterministic UUIDs.

Other approaches may include generating a set of values (e.g. to seed a database), and then rolling back or truncating the database when you're done with your test. The database_cleaner gem is a staple for doing that, but your original post doesn't really justify the additional complexity. I mention it here mostly to point out that there are fixture/factory solutions for most use cases that still allow you to follow the same basic pattern of injecting or relying on predictable data.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • In BDD/TDD, my "problem" mostly applies to unit tests. Also: I am not "rolling my own". But e.g. a `test_does_not_overwrite_aggregate_id` would need to test that "some UUID == some other UUID" in certain cases. Using deterministic IDs makes test failures much easier to grasp. – berkes Oct 29 '20 at 08:57
  • 1
    If you are using mocks/stubs in unit tests, then a test like your example, `test_does_not_overwrite_aggregate_id` could be implemented either with message expectations, i.e checking that a method was invoked on an object with certain specific parameters. So you could generate a random UUID before the test, and then check that all collaborating objects were invoked with this value. Or, you could have your method return the aggregate itself, and you can check whether its id was changed or not. – Stefan Rendevski Oct 29 '20 at 11:42
0

If you are using rspec you can stub the return value of SecureRandom.uuid.

context "my example context" do
  let(:expected_uuid) { "709ab60d-3c5f-48d8-ac55-dc6b8f4f85bf" }

  before do
    allow(SecureRandom).to receive(:uuid).and_return(expected_uuid)
  end 

  it "uses the expected uuid" do
    # your spec
  end 
end

This will return expected_guid everytime SecureRandom.uuid is called within the context of "my example context".

  • Thanks for the suggestion. I'm using minitest, so stubbing is a little harder as it discourages that explicitely. Mocha would solve that. My problem, however, is around what you hardcoded above, the "expected_uuid": I want to have that more centralised, and in a helper. E.g. `allow(SecureRandom).to receive(:uuid).and_return(fake_uuid(:harry_potter))` to return "that UUID that belongs to `:harry_potter`. – berkes Oct 29 '20 at 09:01
  • 1
    @berkes I see. I this case I would either go for a) a mapping (as you already suggested "An alternative would be to harcode a pre-generated list of UUIDs and pick one based on arguments"), where `:harry_potter` is a key or b) or something similar to @Stefan s last example. – Thomas Koppensteiner Oct 29 '20 at 19:20