2

I'm creating a spider with scrapy, and I want to use MySQL database to get start_urls in my spider. Now I would like to know if it's possible to connect scrapy-cloud to a remote database?

gueyebaba
  • 51
  • 4

1 Answers1

5

You can do that by overriding the start_requests spider method:

http://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests

You can basically do anything you want from there.

Mysql python is installed by default on the scrapy cloud. Docs: http://mysql-python.sourceforge.net/

José Ricardo
  • 1,479
  • 1
  • 14
  • 28
  • Thank you it was very helpfull – gueyebaba Jul 24 '15 at 08:52
  • Now I override start_requests, and give my IP address to host, exemple con = mdb.connect(host='192.168.1.2', user='root', passwd='admin', db='scrapinghub'). When I deploy the spider on ScrapingHub I get this error: Can't connect to MySQL server on '192.168.1.26' – gueyebaba Jul 24 '15 at 15:18
  • Hi, this isn't you public IP, this is you local network address. To find you public IP visit http://httpbin.org/ip from the machine that's hosting the mysql server. – José Ricardo Jul 25 '15 at 13:36
  • With my public IP address provided by http://httpbin.org/ip. I get the same error – gueyebaba Jul 28 '15 at 11:48
  • Are you sure the server is listening for connections from non-local machines? – José Ricardo Jul 30 '15 at 15:13
  • You are having trouble with your network setup. In order to access the Internet on the server where you are hosting your database, you are using NAT https://en.wikipedia.org/wiki/Network_address_translation probably using commodity firewall. You need to configure your firewall to allow this traffic, which is way way beyond the original scope of this question. – ftrotter Aug 28 '16 at 17:35