I'm creating a spider with scrapy, and I want to use MySQL database to get start_urls in my spider. Now I would like to know if it's possible to connect scrapy-cloud to a remote database?
Asked
Active
Viewed 1,358 times
2
-
Can I run a spider in scrapinghub with a remote database to get start_urls – gueyebaba Jul 20 '15 at 14:57
1 Answers
5
You can do that by overriding the start_requests
spider method:
http://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests
You can basically do anything you want from there.
Mysql python is installed by default on the scrapy cloud. Docs: http://mysql-python.sourceforge.net/

José Ricardo
- 1,479
- 1
- 14
- 28
-
-
Now I override start_requests, and give my IP address to host, exemple con = mdb.connect(host='192.168.1.2', user='root', passwd='admin', db='scrapinghub'). When I deploy the spider on ScrapingHub I get this error: Can't connect to MySQL server on '192.168.1.26' – gueyebaba Jul 24 '15 at 15:18
-
Hi, this isn't you public IP, this is you local network address. To find you public IP visit http://httpbin.org/ip from the machine that's hosting the mysql server. – José Ricardo Jul 25 '15 at 13:36
-
With my public IP address provided by http://httpbin.org/ip. I get the same error – gueyebaba Jul 28 '15 at 11:48
-
Are you sure the server is listening for connections from non-local machines? – José Ricardo Jul 30 '15 at 15:13
-
You are having trouble with your network setup. In order to access the Internet on the server where you are hosting your database, you are using NAT https://en.wikipedia.org/wiki/Network_address_translation probably using commodity firewall. You need to configure your firewall to allow this traffic, which is way way beyond the original scope of this question. – ftrotter Aug 28 '16 at 17:35