A benchmark between the Trie gem and Triez gem in this particular use case:
word count: 228982
user system total real
trie 13.410000 0.050000 13.460000 ( 13.463473)
triez 11.080000 0.010000 11.090000 ( 11.102195)
trie tail 39.920000 0.140000 40.060000 ( 40.102285)
triez tail 28.960000 0.030000 28.990000 ( 29.022630)
Generally, Triez is faster for the Op's use case.
require 'triez'
require 'trie'
require 'benchmark'
DICT = '/usr/share/dict/web2'
triez = Triez.new value_type: :object, default: nil
trie = Trie.new
count = 0
File.foreach(DICT) do |word|
word.chomp!
if word.size > 4
triez[word] = word
trie.add word
count += 1
end
end
puts "word count: #{count}"
def in_trie?(str, trie)
0.upto(str.length - 1) do |i|
node = trie.root
i.upto(str.length - 1) do |j|
break unless node.walk! str[j]
if node.terminal?
return str[i..j]
end
end
end
nil
end
def in_triez?(str, triez)
triez.change_all(:substring, str) do |v|
return v if v
end
nil
end
Benchmark.bm(12) do |b|
b.report('trie') do
1_000_000.times { in_trie?('ifdxawesome45someword3', trie) }
end
b.report('triez') do
1_000_000.times { in_triez?('ifdxawesome45someword3', triez) }
end
b.report('trie tail') do
1_000_000.times { in_trie?('ifdx45someword3awesome', trie) }
end
b.report('triez tail') do
1_000_000.times { in_triez?('ifdx45someword3awesome', triez) }
end
end
UPDATE benchmark for rambling-trie, where the lines with c
prefix is the compressed version. (NOTE: the ROUND has been reduced to 100K rather than 1M in the prefix benchmark)
Word count: 228982, ROUND: 100000
user system total real
trie 1.510000 0.000000 1.510000 ( 1.511772)
triez 1.170000 0.000000 1.170000 ( 1.176075)
rambling 4.800000 0.010000 4.810000 ( 4.847021)
c rambling 25.060000 0.050000 25.110000 ( 25.172771)
trie tail 4.540000 0.010000 4.550000 ( 4.566233)
triez tail 3.080000 0.010000 3.090000 ( 3.092655)
rambling tail 4.780000 0.010000 4.790000 ( 4.803114)
c rambling tail 23.470000 0.020000 23.490000 ( 23.525066)
It seems rambling-trie is implemented purely in Ruby, and it doesn't offer direct methods to do prefix matching. The following monkey patches need to be added first. There may be better implementation, but I didn't dig further.
class Rambling::Trie::Container
def match_prefix?(str)
root.match_prefix?(str.chars)
end
end
class Rambling::Trie::RawNode
def match_prefix?(chars, i = 0)
if children_tree.empty?
true
elsif i >= chars.size
false
else
letter = chars[i].to_sym
child = children_tree[letter]
!!child && child.match_prefix?(chars, i + 1)
end
end
end
class Rambling::Trie::CompressedNode
def match_prefix?(chars)
if children_tree.empty?
true
if chars.empty?
false
else
!!(recursive_get :match_prefix?, chars)
end
end
end
def in_r_trie?(str, r_trie)
0.upto(str.length - 1) do |i|
if r_trie.match_prefix? str[i..-1]
return true
end
end
false
end