Questions tagged [portia]

Portia is a tool for visually scraping web sites without any programming knowledge. Just annotate web pages with a point and click editor to indicate what data you want to extract, and portia will learn how to scrape similar pages from the site. Portia has a web based UI served by a Twisted server, so you can install it on almost any modern platform.

Portia is a tool for visually scraping web sites without any programming knowledge. Just annotate web pages with a point and click editor to indicate what data you want to extract, and portia will learn how to scrape similar pages from the site.

Portia has a web based UI served by a Twisted server, so you can install it on almost any modern platform.

https://github.com/scrapinghub/portia

55 questions
3
votes
0 answers

Scrapy webscraping an overwatch profile page

I'm very new to python, and coding in general. I'm trying to make a webcrawler that scrapes the data from an overwatch player page (eg: https://playoverwatch.com/en-gb/career/pc/eu/Taimou-2526) I tried using portia, and it worked in the cloud, but I…
3
votes
0 answers

Build Docker Images, Error: standard_init_linux.go:178: exec user process caused "no such file or directory"

I am building a Docker Image for portia, but when i am follow all steps below, when i run the docker run, it comes out the error: standard_init_linux.go:178: exec user process caused "no such file or directory" image The steps I am…
Brian
  • 41
  • 5
3
votes
0 answers

How to add cookies in Portia

I am using Portia to scrape a website, but it has a popup for location selection. This JS based and hence I cannot interact with. The website stores a cookie which then disables the popup, rendering the website usable. How do I add the cookie while…
user3295878
  • 831
  • 1
  • 6
  • 19
2
votes
0 answers

How to scrape Legue Of Legends summoner ranking data, with Portia?

Hi everyone. My thesis project is about e-sport behavior analytics. I have limited programming and data engineer skills. I have a list of summoner names (the participants of the study) I try to scrape their ranking data from pages like…
K0BA
  • 47
  • 2
  • 7
2
votes
1 answer

Portia Spider logs showing ['Partial'] during crawling

I have created a spider using Portia web scraper and the start URL is https://www1.apply2jobs.com/EdwardJonesCareers/ProfExt/index.cfm?fuseaction=mExternal.searchJobs While scheduling this spider in scrapyd I am getting DEBUG: Crawled (200)
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
2
votes
0 answers

HOw to get dummy scrapy stuts count in scrapyd

How do i get the the "DummyStatsCollector" in scrapyd. I have studied from this link "http://doc.scrapy.org/en/latest/topics/stats.html#dummystatscollector".. but there is no brief explanation about get scraped status in scrapyd. I would like to be…
Karthick
  • 55
  • 8
2
votes
1 answer

unable to deploy portia spider with scrapyd-deploy

Could you please help me figure out what I'm doing wrong ? Here are the steps: followed the portia install manual found here https://github.com/scrapinghub/portia - all ok created a new project, entered an url, tagged an item - all ok clicked…
Mihai
  • 133
  • 1
  • 14
2
votes
1 answer

Schedule a spider in scrapyd and pass spider config options

I'm trying to configure spiders created with slyd to use scrapy-elasticsearch, so I'm sending -d parameter=value to configure it: curl http://localhost:6800/schedule.json -d project=myproject -d spider=myspider -d setting=CLOSESPIDER_ITEMCOUNT=100…
localhost
  • 55
  • 1
  • 6
1
vote
0 answers

Portia interface fails to connect to server Safari

I followed instruction from portia site: http://portia.readthedocs.io/en/latest/installation.html and installed it using docker and allocated port 9001 for portia to run. This is the response: $docker run --rm -it -p 9001:9001 -v…
1
vote
0 answers

installing portia successful in windows but failed to run

My computer is Win7 64 and I have install vagrant and virtualbox. I have installed portia through follow way: git clone https://github.com/scrapinghub/portia vagrant up The result in the cmd can be showed in this: ==> default: Installing collected…
she35
  • 11
  • 2
1
vote
0 answers

Portia spider not crawling items

I have created a spider using Portia UI and I have deployed and scheduled in one of my virtual machine using scrapyd. Spider ran fine and scraped website contents. But when I try to deploy and schedule the same spider in another similar virtual…
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
1
vote
1 answer

How to run Scrapy/Portia on Azure Web App

I am trying to run Scrapy or Portia on a Microsoft Azure Web App. I have installed Scrapy by creating a virtual environment: D:\Python27\Scripts\virtualenv.exe D:\home\Python And then installed Scrapy: D:\home\Python\Scripts\pip install Scrapy The…
jimbo
  • 582
  • 1
  • 11
  • 28
1
vote
1 answer

How do I get the least articles of a website use portia

I am using portia to crawl the article of a website, now I wonder how can I get the least article everyday, when run the portia spider? I have a idea that to use datetime from the article, and compared with now datetime.But is there a better one?
gangzi
  • 105
  • 1
  • 13
1
vote
0 answers

How to get URL from Crawled instead of Scraped from in Portia spider deployment?

I am deploying a Portia spider in scrapyd. While deploying I am passing URLs for every link parsing Example: The URL(say URL_1) crawled by the spider is http://www.example.com/query1 and the URL(say URL_2) I am passing is…
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
1
vote
1 answer

unable to deploy portia project using scrapyd-deploy due to 'No module found ..'

I am evaluating portia and run in to an issue deploying to scrapyd. When I try to deploy my portia project using scrapyd-deploy local -p new_project from my portia project directory I get the following error message Packing version…
Rig
  • 11
  • 2
1
2 3 4