1

I saw an article on perl script performance.

One of the things they mentioned is using hash references instead of accessing the hash directly each and everytime.

What benefit to do I gain from referring to the hash instead of a direct access?

My script reads from a list of server names that in theory could be as much as 100 machines if someone needed that many. So any boost I can give to my script would be great.

Dave Sherohman
  • 45,363
  • 14
  • 64
  • 102
AtomicPorkchop
  • 2,625
  • 5
  • 36
  • 55
  • 2
    You realize that a hash of 100 items is tiny, and any operation will be almost instantaneous on decent hardware? – Rafe Kettler Apr 17 '11 at 08:03
  • Uh I thought that was getting big.., what is considered big? Millions? Well the thing would be a hash of hashes, and the those 100 servers could have several file paths and such inside. – AtomicPorkchop Apr 17 '11 at 08:08
  • today, probably a thousand elements. Considering that a modest laptop today has 4GB ram and a dual-core, 100 is really nothing. – Rafe Kettler Apr 17 '11 at 08:13
  • Good to know, so I have no reason to. yet atleast. – AtomicPorkchop Apr 17 '11 at 08:16
  • I read somehow related article about [Big data buzzword](http://www.xaprb.com/blog/2011/03/31/big-data-is-how-big-exactly/) recently. It is interesting what they consider big. – bvr Apr 17 '11 at 14:32

4 Answers4

8

I don't think there's much of an advantage of $hashref->{"foo"} over $hash{"foo"}. There's probably a small advantage in passing hash refs instead of full hashes to subroutines, but that's about all I can think of. I agree with the comment by Rafe that a hash of 100 items isn't likely to give you performance problems either way. Unless you know you have a performance problem related to hash table access, don't bother with this.

"It's easier to optimize a debugged program than to debug an optimized program."

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
2

I commented earlier that 100 is tiny for a hash. I'll qualify this with a more general statement:

Don't worry about it unless it's a problem. Is your script running slow? If not, then don't fix what's not broken. Premature optimizations are bad for readability and can often lead to bugs. This was a bigger issue in 2004 when the article I presume that you're reading was written. But today, RAM is cheap.

That said, the reason why using references nets better performance than passing by value is that, when you pass a hash as an argument to a sub, it normally has to be copied which uses more memory. This is only an optimization that needs to be made if a.) you pass big hashes to functions a lot and b.) this causes you to use too much memory.

Rafe Kettler
  • 75,757
  • 21
  • 156
  • 151
0

Well, as Rafe mentioned already, a hash with a 100 elements is not really big. One could argue that using a hash reference doesn't give you much advantage over using a normal hash - however it's also not giving you a particular disadvantage (at least I never ran into one). So it's not as bad a premature optimization as one might think.

If your script runs too slow then you might want to use a profiler to find out where you are losing the time.

Dave Sherohman
  • 45,363
  • 14
  • 64
  • 102
ChrisWue
  • 18,612
  • 4
  • 58
  • 83
0

Sorry, but that article was wrong if that's what it said. There's no way that dereferencing a reference then accessing a hash element can take less time than just accessing a hash element.

>perl -MO=Concise,-exec -e"$x = $h{x}"
...
3  <#> gv[*h] s
4  <1> rv2hv sKR/1
5  <$> const[PV "x"] s/BARE
6  <2> helem sK/2
...

>perl -MO=Concise,-exec -e"$x = $h->{x}"
...
3  <#> gv[*h] s
4  <1> rv2sv sKM/DREFHV,1    <---
5  <1> rv2hv[t3] sKR/1
6  <$> const[PV "x"] s/BARE
7  <2> helem sK/2
...

That said, the amount of extra time the deref takes should be so minute as to not matter.

ikegami
  • 367,544
  • 15
  • 269
  • 518