0

I am implementing python-rq to pass domains in a queue and scrape it using Beautiful Soup. So i am running multiple workers to get the job done. I started 22 workers as of now, and all the 22 workers is registered in the rq dashboard. But after some time the worker stops by itself and is not getting displayed in dashboard. But in webmin, it displays all workers as running. The speed of crawling has also decreased i.e. the workers are not running. I tried running the worker using supervisor and nohup. In both the cases the workers stops by itself.

What is the reason for this? Why does workers stops by itself? And how many workers can we start in a single server?

Along with that, whenever a worker is unregistered from the rq dashboard, the failed count increases. I don't understand why?

Please help me with this. Thank You

Mannu Nayyar
  • 193
  • 1
  • 5
  • 21
  • Is the worker running in brust mode? In that case, the worker will exit when all jobs done. – 郑福真 Jun 23 '16 at 06:09
  • No. Workers are not running in burst mode. @郑福真 (Its simply running as `nohup python worker.py`) – Mannu Nayyar Jun 23 '16 at 06:30
  • The worker process exit? Or just the workers shown in dashboard disappear? – 郑福真 Jun 23 '16 at 06:44
  • @郑福真 The worker process doesn't seems to exit cause i can see it in webmin running process as well as in the terminal. Workers does disappear from the dashboard. But the speed of crawling decreases. – Mannu Nayyar Jun 23 '16 at 07:00
  • You can see all workers in redis key `rq:workers`, and it seems just the worker key expired or delete by others. – 郑福真 Jun 23 '16 at 13:31
  • @郑福真 it gives me this output `-bash: rq:workers: command not found` – Mannu Nayyar Jun 23 '16 at 15:15
  • I mean the key in redis, `rq:workers` is a set contain all worker keys, then you can check the worker key ttl using ttl command in redis-cli, and check which process delete the worker key. – 郑福真 Jun 23 '16 at 15:20
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/115420/discussion-between-mannu-nayyar-and-). – Mannu Nayyar Jun 23 '16 at 15:25

1 Answers1

0

Okay I figured out the problem. It was because of worker timeout.

try:
  --my code goes here--
except Exception, ex:
  self.error += 1
  with open("error.txt", "a") as myfile:
     myfile.write('\n%s' % sys.exc_info()[0] + "{}".format(self.url))
  pass

So according to my code, the next domain is dequeued if 200 url(s) is fetched from each domain. But for some domains there were insufficient number of urls for the condition to terminate (like only 1 or 2 urls).

Since the code catches all the exception and appends to error.txt file. Even the rq timeout exception rq.timeouts.JobTimeoutException was caught and was appended to the file. Thus making the worker to wait for x amount of time, which leads to termination of the worker.

Mannu Nayyar
  • 193
  • 1
  • 5
  • 21