0

Maybe this is the wrong approach, but I'm trying to parallelize em-hiredis puts and lookups in Goliath with EM::Synchrony::Multi or EM::Synchrony::FiberIterator. However, I can't seem to access basic values initialized in the config. I keep getting method_missing errors.

Here's the basic watered down version of what I'm trying to do:

/lib/config/try.rb

config['redisUri'] = 'redis://localhost:6379/0'
config['redis_db'] ||= EM::Hiredis.connect
config['user_agent'] = "MyCrawler Mozilla/5.0 Compat etc."

Here's the basic Goliath Setup

/try.rb

require "goliath"
require "em-hiredis"
require "em-synchrony/fiber_iterator"
require "em-synchrony/em-hiredis"
require "em-synchrony/em-multi"

class Try < Goliath::API
  use Goliath::Rack::Params
  use Goliath::Rack::DefaultMimeType

  def response(env)
    case env['REQUEST_PATH']
    when "/start" then
      start_crawl()
      body = "STARTING"
      [200, {}, body]
    end 
  end 

  def start_crawl
    urls = ["http://www.example.com/",
      "http://www.example.com/photos/",
      "http://www.example.com/video/",
    ]

    EM::Synchrony::FiberIterator.new(urls, 3).each do |url|
      p "#{user_agent}"
      redis_db.sadd 'test_queue', url
    end

    # multi = EM::Synchrony::Multi.new
    # urls.each_with_index do |url, index|
    #  p "#{user_agent}"
    #  multi.add index, redis_db.sadd('test_queue', url)
    # end
  end
end

However, I keep getting errors where Goliath doesn't know what user_agent is or redis_db which were initialized in the config.

[936:INFO] 2012-09-21 23:47:10 :: Starting server on 0.0.0.0:9000 in development mode. Watch out for stones.
/Users/ewu/.rvm/gems/ruby-1.9.3-p194@crawler/gems/goliath-1.0.0/lib/goliath/api.rb:143:in `method_missing': undefined local variable or method `user_agent' for #<Try:0x007ff5a431c4e0 @opts={}> (NameError)
from ./lib/try.rb:27:in `block in start_crawl'
from /Users/ewu/.rvm/gems/ruby-1.9.3-p194@crawler/gems/em-synchrony-1.0.2/lib/em-synchrony/fiber_iterator.rb:10:in `call'
from /Users/ewu/.rvm/gems/ruby-1.9.3-p194@crawler/gems/em-synchrony-1.0.2/lib/em-synchrony/fiber_iterator.rb:10:in `block (2 levels) in each'
...
...
...

Ideally I'd be able to get FiberIterator working, because I have additional conditionals to check for:

EM::Synchrony::FiberIterator.new(urls, 3).each do |new_url}
  is_member = redis_db.sismember('crawled_urls', new_url)
  is_member += redis_db.sismember('queued_urls', new_url)
  if is_member == 0
    redis_db.lpush 'crawl_queue', new_url
    redis_db.sadd 'queued_urls', new_url
  end
end
eywu
  • 2,654
  • 1
  • 22
  • 24

1 Answers1

1

I don't think your config file is getting loaded. The name of try.rb needs to match the name of the robojin.rb file in the config directory.

dj2
  • 9,534
  • 4
  • 29
  • 52
  • I actually have try.rb in the config directory. I mistyped it in the question. I'm fairly confident that the config file is being loaded since I see in the redis server logs a connection being made when I start up goliath. But I'm still getting the method missing error ---------------------------- [614] 21 Sep 23:41:14 - Accepted 127.0.0.1:49353 [614] 21 Sep 23:41:19 - 1 clients connected (0 slaves), 930960 bytes in use [614] 21 Sep 23:41:24 - 1 clients connected (0 slaves), 930960 bytes in use – eywu Sep 22 '12 at 06:45
  • If I pull the print statement ( p "#{user_agent}" ) outside of the iterator, it outputs just fine. Should the config variables be accessible from within the FiberIterator? – eywu Sep 22 '12 at 06:51
  • 1
    It's quite possible that the binding ends up being different when using the Fiber iterator. I'd suggest assigning the things out of config that you want to use locally to a local variable. – dj2 Sep 22 '12 at 15:34