You can run scrapy from a script using scrapy.crawler.CrawlerProcess module
basically, you can run the spider and export the data temporarily and use it in your streamli app -
import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
# Your spider definition
...
process = CrawlerProcess(settings={
"FEEDS": {
"items.json": {"format": "json"},
},
})
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
Now you can save this script and run using subprocess
which will export the data into items.json
. Use it in your app.
Here is a helpful streamlit cloud scrapy thread with public streamlit-scrapy project github repo