15

We have a few places in a time-critical script where we convert old IDs into strings. At the moment, we use case statements inside a function, like so:

def get_name id
  case id
    when 1
      "one thing"
    when 3
      "other thing"
    else
      "default thing"
  end
end

I'm considering replacing this with a hash lookup, like so:

NAMES = {
  1 => "one thing",
  3 => "other thing",
}
NAMES.default = "default thing"

It feels like it ought to be faster to use NAMES[id] than get_name(id) - but is it?

Simon
  • 25,468
  • 44
  • 152
  • 266
  • 2
    Simon, this is premature optimization. Unless your have thousands and thousands of cases, I wouldn't bother finding out which one is more performant. Just focus on your code. – maček Nov 15 '10 at 04:21
  • we've only a few cases, but we've got ~7,000,000 lookups. – Simon Nov 15 '10 at 09:13

7 Answers7

33

A couple points, first. One is that low-level language constructs like this that more-or-less do the same thing are almost never the bottleneck in any real-world application, so it's (often) foolish to focus on them. Second, as has already been mentioned, if you're really concerned about it you should benchmark it. Ruby's benchmarking and profile tools are certainly not the most advanced in the programming ecosystem, but they get the job done.

My gut instinct is that hashes are going to be faster because (again, I'm guessing) the case statement must check each condition in turn (making finding the items O(n) instead of O(1)). But let's check!

Full benchmarking code is at https://gist.github.com/25 Basically, it generates a file that defines the appropriate case/hash and then uses them. I went ahead and put the hash lookup within a method call, too, so that overhead won't be a factor, but in real life there's no reason it should be stuck inside a method.

Here's what I get. In each case, I'm doing 10,000 lookups. Time is user-time in seconds

Case statement, 10 items  0.020000
Hash lookup, 10 items     0.010000

Case statement, 100 items  0.100000
Hash lookup, 100 items     0.010000

Case statement, 1000 items  0.990000
Hash lookup, 1000 items     0.010000

So, it looks very much like the case statement is O(n) (no shocker there). Also note that 10K lookups is still only a second even in the case statement, so unless you're doing a metric butload of these lookups, you're better off focusing on the rest of your code.

Bill Dueber
  • 2,706
  • 17
  • 16
  • 2
    Case checks each condition sequentially which can be proven using the debugger and single-stepping, plus case is checking type and equality because it's implemented using `===` so I'm not surprised with your results. Long `when` lists will slow down so it's worthwhile trying to determine the ones that will hit most often and move them to the top, or break down the tests into some sort of sub-groups and use nested case statements. – the Tin Man Nov 14 '10 at 20:33
7

Since this depends on a number of factors (how many different IDs you want to convert, how intelligently the compiler can compile the case when statemens), my advice would be: Measure it:

Write a small test routine and convert, say, 10.000.000 ids to strings. Do this a couple of times with either implementation and compare the results. If you have no significant difference, take whatever you like more (I think, the hash solution is a bit more elegant...)

MartinStettner
  • 28,719
  • 15
  • 79
  • 106
  • Another factor: Hash calls `hash` and possibly `eql?` while case calls `===`. So the Speed depends on your hashing and comparing code, too. – Baju Nov 14 '10 at 16:10
  • For what it's worth, by the way, a quick test with ruby 1.8.7 and just 3 cases shows that the hash is slower, as Nakilon said. – s.m. Nov 14 '10 at 16:11
  • I would have expected to see the `case`-solution to be faster for just three cases. But as I've mentioned (in a comment to a response which has been redrawn as it seems): For a large number of cases, `case` might need to do comparisions for each `when`-clause (which means *O(n)* complexity), whereas a good hash implementation has constant complexity (which means *one* call to `hash` and *one* call to `eql?`), so as I said: Simon has to do the benchmarks with his actual data, I think ... – MartinStettner Nov 14 '10 at 17:48
1
$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]

$ cat hash_vs_case.rb 
require 'benchmark'

def get_from_case(id)
  case id
    when 1
      "one thing"
    when 3
      "other thing"
    else
      "default thing"
  end
end

NAMES = {
  1 => "one thing",
  3 => "other thing",
}
NAMES.default = "default thing"

def get_from_hash(arg)
  NAMES[arg]
end

n = 1000000
Benchmark.bm do |test|
  test.report("case  1") { n.times do; get_from_case(1); end }
  test.report("hash  1") { n.times do; get_from_hash(1); end}
  test.report("case 42") { n.times do; get_from_case(42); end }
  test.report("hash 42") { n.times do; get_from_hash(42); end}
end

$ ruby -w hash_vs_case.rb 
      user     system      total        real
case  1  0.330000   0.000000   0.330000 (  0.422209)
hash  1  0.220000   0.000000   0.220000 (  0.271300)
case 42  0.310000   0.000000   0.310000 (  0.390295)
hash 42  0.320000   0.010000   0.330000 (  0.402647)

And here is why you want to upgrade:

$ ruby -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]

$ ruby -w hash_vs_case.rb 
      user     system      total        real
case  1  1.380000   0.870000   2.250000 (  2.738493)
hash  1  1.320000   0.850000   2.170000 (  2.642013)
case 42  1.500000   0.960000   2.460000 (  3.029923)
hash 42  1.890000   0.890000   2.780000 (  3.456969)
zetetic
  • 47,184
  • 10
  • 111
  • 119
  • Your calls to `get_name()` are adding a bit of extra code for each loop which isn't mirrored in the hash lookups using NAMES. To compare equally you should either use a def for the hash lookup, or embed the case statement directly into the `n.times` loop. I'd still expect the hash lookup to be faster, but it's important to remove any external influence when benchmarking to get accurate numbers. – the Tin Man Nov 14 '10 at 23:54
  • actually, I was wanting to compare direct hash lookup with function call plus case, so it's fair to wrap the case in a def but not the hash lookup. – Simon Nov 15 '10 at 09:17
1

Here is an example that shows the case faster for symbol lookup. The previous examples were integer based keys.

https://gist.github.com/02c8f8ca0cd4c9d9ceb2

Amala
  • 1,718
  • 1
  • 17
  • 29
0

Why not use a case statement inline in the time-critical portion of your code, rather than making it its own function call? That avoids that overhead of a call stack, and also avoids the overhead of a hash lookup.

You can also always do a benchmark yourself. Do something like this:

t = Time.now
1_000_000.times { ...your code... }
secs = Time.now - t

Replacing "your code" with each option.

Ben Lee
  • 52,489
  • 13
  • 125
  • 145
0

As Martin said, it depends on how many IDs you want to check. Are you picking the ID and corresponding strings from database or you want to hard-code it. If there is only a couple of checks, then you can go with CASE. But there is no option to modify the ID/String pair, if need arises.

On the other hand, if you are picking a lot of IDs from a database, you can use something like Dictionary to store name/value pairs.

While a dictionary is in the memory, lookup might be fast.

But in the end, it all depends on whether you are working with ever-changing ID/string pair or few constants only.

RKh
  • 13,818
  • 46
  • 152
  • 265
-1

case when has n complexity.
hash lookup has ln(n) complexity, but using an additional object (hash) and calling its methods have its own penalty.

So if you have a lot of cases (thousands, millions, ...) hash lookup is better. But in your case, when you have only 3 variants, case when of course will be much faster.

Nakilon
  • 34,866
  • 14
  • 107
  • 142
  • I'm not sure that hash lookup has really *log(n)* complexity: If it's implemented as a traditional hash list, it would be *O(1)*, i.e. constant complexity but needs an additional step to compute the hash value (which is trivial for integers ...) – MartinStettner Nov 14 '10 at 16:07
  • As far as i know it's implemented as Tree, so `log(n)` is correct. – Baju Nov 14 '10 at 16:08
  • 1
    @Baju: @Martin: Ruby hashs are hashmaps (which is why objects need to define a hash method and why they're called hashs, not trees). Hash lookup has constant complexity in the average case and linear in the worst case (where all keys in the hash have the same hash value). – sepp2k Nov 14 '10 at 16:15
  • 1
    Check out: http://stackoverflow.com/questions/903207/how-are-hash-tables-implemented-internally-in-popular-languages – Garfield Jan 30 '13 at 15:39