20

The script has to verify if one predefined IP is present in a big array of IPs. Currently I code that function like this (saying that "ips" is my array of IP and "ip" is the predefined ip)

ips.each do |existsip|
  if ip == existsip
    puts "ip exists"
    return 1
  end
end
puts "ip doesn't exist"
return nil

Is there a faster way to do the same thing?

Edit : I might have wrongly expressed myself. I can do array.include? but what I'd like to know is : Is array.include? the method that will give me the fastest result?

wurde
  • 2,487
  • 2
  • 20
  • 39
Cocotton
  • 910
  • 2
  • 7
  • 22

5 Answers5

36

You can use Set. It is implemented on top of Hash and will be faster for big datasets - O(1).

require 'set'
s = Set.new ['1.1.1.1', '1.2.3.4']
# => #<Set: {"1.1.1.1", "1.2.3.4"}> 
s.include? '1.1.1.1'
# => true 
Aliaksei Kliuchnikau
  • 13,589
  • 4
  • 59
  • 72
  • 1
    Or in your case: `s = Set.new(ips)` – Phrogz Feb 16 '12 at 15:32
  • Hello again Alex :) .include method source code seems to do almost the same thing as mine. Or is it actually faster? – Cocotton Feb 16 '12 at 15:36
  • 2
    @Cocotton: [Much faster](http://stackoverflow.com/questions/5551168/performance-of-arrays-and-hashes-in-ruby/5552062#5552062). You could also use a Hash with ip's as keys and 'true' as values. – steenslag Feb 16 '12 at 15:49
  • @steenslag,Phrogz : For now I'll leave it like this (.include), but when I'll be done will my script, I'll switch it for a hash – Cocotton Feb 16 '12 at 15:53
  • 1
    The obvious caveat here is that the Set is faster, but building the Set can be an expensive operation, so you wouldn't want to build a set to query it a small number of times. – Marc Talbot Feb 16 '12 at 16:11
  • 3
    For an array of 12.4 million short strings: `a=('a'..'zzzzz').to_a; time{ a.include?('0') } #=> 0.71s; time{ Set.new(a) } #=> 11.2s`; so yes, the overhead of creating the set needs to be worth the performance gains of instantaneous queries. – Phrogz May 15 '12 at 19:49
  • Backing up @Phrogz statement, check the source code of both #include? methods and this gist as proof: https://gist.github.com/vadviktor/66e524c591b2604ce2d7 – Ikon Jul 14 '15 at 14:14
6

You could use the Array#include method to return you a true/false.

http://ruby-doc.org/core-1.9.3/Array.html#method-i-include-3F

if ips.include?(ip) #=> true
  puts 'ip exists'
else
  puts 'ip  doesn\'t exist'
end
ericraio
  • 1,469
  • 14
  • 35
3
ips = ['10.10.10.10','10.10.10.11','10.10.10.12']

ip = '10.10.10.10'
ips.include?(ip) => true

ip = '10.10.10.13'
ips.include?(ip) => false

check Documentaion here

dku.rajkumar
  • 18,414
  • 7
  • 41
  • 58
  • But is this actually faster than my method? Because the source code of this seems to do practically the same thing as mine. – Cocotton Feb 16 '12 at 15:36
  • ofcourse it is faster.. i have using in my project.. moreover when there is a method in ruby, why should we write extra code. – dku.rajkumar Feb 16 '12 at 15:39
  • @dku.rajkumar wanted to say, it should be faster as `.include?` is implemented on C level on the Array class. – Ikon Jul 14 '15 at 13:40
3

A faster way would be:

if ips.include?(ip)
  puts "ip exists"
  return 1
else
  puts "ip doesn't exist"
  return nil
end
Don Cruickshank
  • 5,641
  • 6
  • 48
  • 48
  • Slightly faster, since the `each` occurs in C instead of Ruby, but it's still O(n) versus O(1) for a Hash or Set. – Phrogz Feb 16 '12 at 15:33
2

have you tried the Array#include? function?

http://ruby-doc.org/core-1.9.3/Array.html#method-i-include-3F

You can see from the source it does almost exactly the same thing, except natively.

Peter Ehrlich
  • 6,969
  • 4
  • 49
  • 65
  • 1
    This is still an O(n) time operation, as it must search through every item in the array (even if it's in C). – Phrogz Feb 16 '12 at 15:33
  • I know that an enum can be sorted, but I don't know how to search such a sorted array. One could make an indexed database column to do the job. – Peter Ehrlich Feb 16 '12 at 17:54
  • 1
    Even a binary search is O(log n). Hashing an item and looking it up in a hash table is a constant-time operation that is unrelated to the number of items stored. – Phrogz Feb 16 '12 at 22:04