14

I'm writing a small system that will allow me to sell my band's music at gigs by generating vouchers that can be redeemed for MP3s at our website.

The vouchers will need a code that the user types in. The code needs to have the following qualities:

  1. Some level of human readability in terms of length and content, to prevent user frustration and data entry error.
  2. Given one voucher code, not trivial to guess another voucher code.

If I use GUIDs I'm concerned about point 1. If I use an incrementing integer I'm concerned about point 2. There has to be some happy medium in between, right? I thought perhaps this work has already been done and there's an ideal solution waiting out there for me. In the absence of that, I'm thinking I'll go with a random alphanumeric string, or possibly letters only (excluding I and O for clarity), and have the application block IP addresses that fail X number of times, which would indicate a possible brute force attack. If I went with that, how long of a string and what value of X would work, and why?

Thanks for your help!


Update: I wasn't totally explicit about the method: I will generate lists of voucher codes for printing, then enter the "sold" codes after a gig. Therefore I think elements like a checksum are not necessary like they are in software keys that don't use validation servers.

dreftymac
  • 31,404
  • 26
  • 119
  • 182
James Orr
  • 5,005
  • 7
  • 39
  • 63
  • On the blocking of brute force attacks I'd not bother to start with. With respect to you and your band, it's not as though you're protecting something really important. It just seems a little disproportional to me. – Tom Duckering Dec 18 '09 at 04:08
  • You're absolutely right, I'm having entirely too much fun designing the system. But there you go, i'm a programmer at heart. plus, if it all works out i might host other bands' albums. – James Orr Dec 18 '09 at 04:19
  • They're protecting their work. Notice the word "sell" in the question. – R. Martinho Fernandes Dec 18 '09 at 04:20

13 Answers13

9

You could use a Markov Chain trained on English syllables to create a sentence composed of pronounceable-gibberish words. Just add the generated sentence to a database of valid vouchers when you print them (and invalidate them when they're redeemed, of course).

caf
  • 233,326
  • 40
  • 323
  • 462
  • You can also use some form of N-gram analysis: it may be easier to understand and implement. http://en.wikipedia.org/wiki/N-gram – R. Martinho Fernandes Dec 18 '09 at 04:09
  • My last comment is a bit confusing: N-gram analysis can be used to do the "training" part. – R. Martinho Fernandes Dec 18 '09 at 04:12
  • 1
    If you want to avoid the "Automated Curse Generator" problem, you can train it on words instead of syllables. I wrote such a thing in C# last week, and after feeding it a book for analysis I spits out "sentences" like "how many men are now faced with a lay education", "it would be to go on if you dont understand the situation". – R. Martinho Fernandes Dec 18 '09 at 04:38
5

AOL used to use a random combination of two words for the CDs they sent out. You can take the same approach, and just increase the number of words to get the odds that you require.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • I like this! Three words from [this list](http://www.math.toronto.edu/jjchew/scrabble/lists/common-234.html) of 3 and 4 letter words would have a keyspace of 128,405,466,125... very acceptable. – James Orr Dec 18 '09 at 04:57
5

I would use your own encoding scheme. In addition to omitting I and O, for optimal readability it's also a good idea to omit all but one letter out of near-homonym sets (C/E, M/N) and multisyllabic letters, such as W, and of course stick to one case.

As far as length, you could use 60 bits, plus a 4-bit checksum. 64 bits is enough to store the time to millisecond granularity for several thousand years, so it's for all practical purposes unguessable. At say 4 bits per letter, that's 16 letters long. Even half that length is probably plenty.

Another way to think of this is in the form of automobile license plates: 3 letters and 3 numbers is enough to cover a pretty large state, and tends to be very readable. Unless you provide a way for someone to hack codes at high-speed, they certainly won't be guessable at human time scales.

RickNZ
  • 18,448
  • 3
  • 51
  • 66
  • @RickNZ: 64 bit timestamps are used by Windows NTFS and OpenVMS: both count at ten million ticks per second. The year range is from 1601 to 60,055 for NTFS and 1858 to 31,084 for VMS. (VMS reserves the "negative" half of the range for relative time purposes.) – wallyk Dec 18 '09 at 04:41
5

Well, if you really want human readable, you can use BubbleBabble. Create a Perl script like the following:

#!/usr/bin/perl
use Digest::BubbleBabble qw(bubblebabble);
use Digest::SHA1 qw(sha1);
print bubblebabble(Digest => sha1(join(' ', @ARGV))), "\n";

Then feed it any command line argument you want to get output like the following:

xogan-nydut-zogiv-kotyn-ledah-taseb-gyhib-tucel-vudul-mykom-mexax

Or if Perl's not your preference, you can use PWGen (also available online to get output like this:

aiCee5om Ohxai2is tae3Gael Gaeth7ei ooCh0ish

Honestly, this level of human readability is overkill; RickNZ's answer should work just fine (and is pretty close to what we did for some software keys). But BubbleBabble is kind of fun.

Josh Kelley
  • 56,064
  • 19
  • 146
  • 246
4

Only 8 alphanumeric letters (except I and O) have 1785793904896 possible combinations. That's for all intent and purposes unguessable as long as you don't have 5 billions vouchers.

Andreas Bonini
  • 44,018
  • 30
  • 122
  • 156
4

Context

  • human-readable UUID
  • language-independent algorithm

Problem

  • devise an algorithm for generating "human readable" UUIDs (HR-UUID)
  • HR-UUID should be robust against brute-force guesses
  • entry and recall by a human being should be straightforward and not error-prone
  • having 1 or more known valid HR-UUID should not be statistically relevant for guessing other valid HR-UUIDs

Solution

  • Use the DiceWare password algorithm.
  • In contrast to the other solutions offered in this thread, this approach solves the human-readable UUID problem by re-casting the problem to that of password generation.
  • In contrast to the BubbleBabble solution offered elsewhere in this thread, Diceware allows you to choose how many elements are included in each UUID, depending on how many times you wish to "roll the die" ... this means you get to choose the entropy per UUID.
  • DiceWare password algorithm solves the problem of generating high-entropy passphrases that are nonetheless easy for humans to both enter and remember.
  • Below is a sampling of Diceware "UUIDs" consisting of six elements each:

    crabmeat-coach-properly-driving-yoga-ferret
    edition-mousy-fabric-budding-book-mortuary
    rickety-uncrown-earful-majority-sublet-evade
    

See also

CodeFox
  • 3,321
  • 1
  • 29
  • 41
dreftymac
  • 31,404
  • 26
  • 119
  • 182
2

5 blocks of 5 characters each should be sufficient - four blocks for the "key", the fifth as a checksum to ensure validity. And of course, don't use the whole keyspace.

That's roughly how software serial numbers appear to be laid out, anyway.

Anon.
  • 58,739
  • 8
  • 81
  • 86
  • Interesting, I never knew that! For my system however that kind of algorithm isn't directly applicable as i'll be pre-generating these numbers and then "validating" the codes I sold after a gig. – James Orr Dec 18 '09 at 04:14
  • It's still applicable - you don't need to give out all of the codes, after all. – Anon. Dec 18 '09 at 04:49
2

hmm, I do not know how most systems work, but I think it would be neat and simple to define a static number and multiply that number by a random other number. Then if the big GUID is a multiple of your static you are good.

Easy to produce, not easy to guess a new one (short term use only)

int i = 61234;
int j = rand()%99999
long GUID = i * j;

will give you a phone number length GUID

only 99999 uses though! doh

Charles
  • 3,734
  • 3
  • 31
  • 49
2

Probably best to avoid all the vowels[*], thus avoiding all the swearwords.

[*] Including W if you're Welsh!

NickZoic
  • 7,575
  • 3
  • 25
  • 18
  • 1
    W is also the only multi-syllabic letter, so it takes much longer to say (hence my intense dislike of "www" for websites!). – RickNZ Dec 18 '09 at 04:26
  • remember "trip dub"? or worse yet, back in the nineties on the radio you would hear "aitch tee tee pee, colon, forward slash, forward slash, ..." – James Orr Dec 18 '09 at 04:37
  • Rick: totally agree, there's plenty of reasons to avoid it! By the time you cut out all the vowels and all the easy to mistake letters you get down to about 16, which is just right for 4 bits per character anyway. – NickZoic Dec 18 '09 at 04:59
1

One simple solution is to call the getHashCode method that most languages have on their string types. Set the string to some word from your list of approved words. Then call gethashcode and that will be your key. To verify it, compare it against your list of existing word hashes and maybe delete it from the list so it can't be used again.

jvilalta
  • 6,679
  • 1
  • 28
  • 36
1

I'm assuming you're getting an email address when they purchase the voucher (you should). If so, why not just email them a single-use GUID? That way both you and they have a record of it, you can track redemptions, you don't run the risk of guessing (or at least not one worth bothering with), the user doesn't have to remember anything because it's there in the email, and you don't have to code anything.

They give you email address. You email GUID (with link). They click link and get song. GUID use is registered in system and will no longer work.

  • 2
    As much as I would like to get a list of fan email addresses, I think it would act as a deterrent. We're talking a 1:00 AM drunken $5 impulse buy, and writing down your email address could really dampen that impulse. – James Orr Dec 18 '09 at 04:29
  • Good point! If I like the band that wouldn't deter me but I may be the exception rather than the rule. –  Dec 18 '09 at 13:20
1

Why not just go with the GUID and then replace any questionable characters with a different letter (so 0 becomes 'h', 1 is 'q' and so forth).

Grant Peters
  • 7,691
  • 3
  • 45
  • 57
0

you can try something like random letter sequence generator ?. You can mix and match letters/numbers as well

ram
  • 11,468
  • 16
  • 63
  • 89