3

I have a scrapy spider that scrapes products information from amazon based on the product link.

I want to deploy this project with streamlit and take the product link as web input, and product information as output data on the web.

I don't know alot about deployment, so anyone can help me with that.

2 Answers2

1

You can create a public repository on GitHub with streamlit and connect your account with 0auth. Then you can deploy it on the streamlit servers after signing in the streamlit website.

  • Ok, But how do i run the spider from the streamlit code. I know how to create a streamlit web app, I don't know how to connect my web scraper with streamlit app. – AbhayParashar31 Feb 07 '22 at 17:55
  • You need to take it in a loop and push it into a public repo then the run is done by streamlit. – Peker Celik Feb 07 '22 at 17:59
0

You can run scrapy from a script using scrapy.crawler.CrawlerProcess module

basically, you can run the spider and export the data temporarily and use it in your streamli app -

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess(settings={
    "FEEDS": {
        "items.json": {"format": "json"},
    },
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

Now you can save this script and run using subprocess which will export the data into items.json. Use it in your app.

Here is a helpful streamlit cloud scrapy thread with public streamlit-scrapy project github repo

ahmedshahriar
  • 1,053
  • 7
  • 25