0

I have a scrapy project whose spider is as shown below. the spider works when I run this spider with this command: scrapy crawl myspider

class MySpider(BaseSpider):
    name = "myspider"

    def parse(self, response):
        links = SgmlLinkExtractor().extract_links(response)

        for link in links:
            item = QuestionItem()
            item['url'] = link
            yield item

    def __init__(self):
        start_urls = []

        conn = MySQLdb.connect(host='127.0.0.1',
                       user='root',
                       passwd='xxxx',
                       db='myspider',
                       port=3306)
        cur = conn.cursor()
        cur.execute("SELECT * FROM pages")
        rows = cur.fetchall()
        for row in rows:
            start_urls.append(row[0])

        self.start_urls = start_urls 

        conn. close()

after I deploy this project to scrapyd with "scrapy deploy -p mysqlproject" and then schedule the spider with "curl http://localhost:6800/schedule.json -d project=mysql -d spider=myspider"

problem is start_urls is not being filled from the database. instead, sql command returns an empty array. Because (I guess) it connects to its own mysql.db which is configured by dbs_dir as shown here: http://doc.scrapy.org/en/0.14/topics/scrapyd.html#dbs-dir

How can I establish a connection between scrapyd and mysql server instead of mysql.db ?

Alican
  • 191
  • 2
  • 3

1 Answers1

0

I guess your problem is not dbs_dir which only points to internal SQLite database. Maybe you are connecting to the MySQL server running on scrapyd's deploy server instead of server containing start_urls.

shirk3y
  • 142
  • 3