1

I have a function named spider which takes seed as an argument. seed is the name of the URL I send to the spider function. Now my question is how do I use beanstalkc in Python to queue the URLs and perform the jobs.

Keith Pinson
  • 7,835
  • 7
  • 61
  • 104
Srikanth
  • 21
  • 2

1 Answers1

1

According to the tutorial you would need:

  1. beanstalkd server is running.
  2. Connect:

    import beanstalkc
    beanstalk = beanstalkc.Connection(host='localhost', port=14711)
    
  3. Add jobs using:

    beanstalk.put('seed url')
    
  4. Get job via:

    job = beanstalk.reserve()
    spider(job.body)
    
  5. Mark job as completed:

    job.delete()
    
earl
  • 40,327
  • 6
  • 58
  • 59
Damian
  • 449
  • 4
  • 11
  • job = beanstalk.reserve() spider(job.body) can you please explain where am i sending the url(or seed) to spider because when i try printing job.body() it prints "True" not the url – Srikanth Jun 27 '11 at 08:58
  • 2
    It's an attribute so job.body not job.body(). Please follow the tutorial step by step first, that should give you a nice intro. – Damian Jun 27 '11 at 09:08