I have a scrapy project whose spider is as shown below. the spider works when I run this spider with this command: scrapy crawl myspider
class MySpider(BaseSpider):
name = "myspider"
def parse(self, response):
links = SgmlLinkExtractor().extract_links(response)
for link in links:
item = QuestionItem()
item['url'] = link
yield item
def __init__(self):
start_urls = []
conn = MySQLdb.connect(host='127.0.0.1',
user='root',
passwd='xxxx',
db='myspider',
port=3306)
cur = conn.cursor()
cur.execute("SELECT * FROM pages")
rows = cur.fetchall()
for row in rows:
start_urls.append(row[0])
self.start_urls = start_urls
conn. close()
after I deploy this project to scrapyd with "scrapy deploy -p mysqlproject" and then schedule the spider with "curl http://localhost:6800/schedule.json -d project=mysql -d spider=myspider"
problem is start_urls is not being filled from the database. instead, sql command returns an empty array. Because (I guess) it connects to its own mysql.db which is configured by dbs_dir as shown here: http://doc.scrapy.org/en/0.14/topics/scrapyd.html#dbs-dir
How can I establish a connection between scrapyd and mysql server instead of mysql.db ?