Questions tagged [scraperwiki]

ScraperWiki was an online tool for Screen Scraping.

ScraperWiki ScraperWiki was a platform for writing and scheduling screen scrapers, and for storing the data they generate. It support Ruby, Python and PHP. A later version of the service was called QuickCode, which has also been decommissioned.

"Scraper" refers to screen scrapers, programs that extract data from websites. "Wiki" means that any user with programming experience can create or edit such programs for extracting new data, or for analyzing existing datasets.

68 questions
0
votes
1 answer

Is there a way to delete a view on scraperwiki?

Is there a way to delete a view on scraperwiki? I can't find a way to do that anywhere on the site.
killdash9
  • 2,314
  • 2
  • 23
  • 17
0
votes
0 answers

I'm only scraping the first element of each page using BeautifulSoup, my goal is to scrape all elements within the page, What am I doing wrong?

I'm trying to to scrape the public contact info of all the persons in each page of the website so I build 3 functions, one to modify the URL, one to extract the source code from it using BeautifulSoup and one to transform it and finally get the…
0
votes
1 answer

Scraping with Invoke-WebRequest

We are migrating an asp.net intranet to SharePoint and automating the conversion via PowerShell. We only want to scrap links from within the DIV tag with a classname 'topnav'. Not all the links on the page $url = "http://intranet.company.com" $page…
user2019423
  • 69
  • 1
  • 2
  • 6
0
votes
1 answer

How to selectively scrape html with repeated class IDs

I am new to python and have searched stackoverflow in vain for an answer that I can understand. Thanks in advance for any help or advice you can give. I am trying to scrape information on price and location from a housing sales website, i.e. the…
JER
  • 21
  • 2
0
votes
1 answer

Using scraperwiki for pdf-file on disk

I am trying to get some data out of a pdf document using scraperwiki for pyhon. It works beautifully if I download the file using urllib2 like so: pdfdata = urllib2.urlopen(url).read() xmldata = scraperwiki.pdftoxml(pdfdata) root =…
w_a_s
  • 3
  • 3
0
votes
1 answer

A table of scraperwiki.sqlite isn't found

I have a script in Ruby which uses scraperwiki gem. In the directory of this script there's file titled scraperwiki.sqlite. items.each do |x| if ScraperWiki.select("* from data where .... { x['key123']}'").empty? …
Incerteza
  • 32,326
  • 47
  • 154
  • 261
0
votes
2 answers

lxml not working with django, scraperwiki

I'm working on a django app that goes through Illinois' General Assembly website to scrape some pdfs. While deployed on my desktop it works fine until urllib2 times out. When I try to deploy on my Bluehost server, the lxml part of the code throws up…
0
votes
1 answer

Installing Scraperwiki for Python generates an error pdftohtml not found

I have been trying to install Scraperwiki module for Python. However, it generates the error: ""UserWarning: Local Scraperlibs requires pdftohtml, but pdftohtml was not found in the PATH. You probably need to install it". I looked into poppler as…
0
votes
1 answer

Performance Optimization of scraping code

I am studying web scraping for big data, so I wrote the following code to take some information from a local server on our campus. It works fine but I think the performance is very slow; each record takes 0.91s to get stored in the database. What…
Mohammad Abu Musa
  • 1,117
  • 2
  • 10
  • 32
0
votes
1 answer

How to add this data to database in scraperwiki

import scraperwiki import urllib2, lxml.etree url = 'http://eci.nic.in/eci_main/statisticalreports/SE_1998/StatisticalReport-DEL98.pdf' pdfdata = urllib2.urlopen(url).read() xmldata = scraperwiki.pdftoxml(pdfdata) root =…
0
votes
1 answer

Fixing a 'sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type. Try converting types or pickling.'

I'm stuck on this scraper in ScraperWiki. I just want the text from the li-elements in the ul with dir='ltr'. I run this script every week and sentences could be similar to each other, while being a completely new sentence. That's why I want to…
Jerry Vermanen
  • 297
  • 1
  • 2
  • 19
0
votes
1 answer

ScraperWiki scrape frequence

This might be a stupid question but I am currently working with scraping twitter by using Scraperwiki. Tho ScraperWiki run-frequency is rather low. Is there a way to force-run ScraperWiki to run more frequently without touching python since my…
0
votes
0 answers

error importing python library in scraperwiki

I am using scraperwiki to run some code in Python. However, when I run this code I am getting this error: Traceback (most recent call last): File "./code/scraper", line 4, in from scrapemark import scrape ImportError: No module named…
0
votes
1 answer

Scraperwiki: how to save data into one cell in table

Here is my code for the scraper that is extracting the URL and corresponding comments from that particular page: import scraperwiki import lxml.html from BeautifulSoup import BeautifulSoup import urllib2 import re for num in range(1,2): …
0
votes
1 answer

How to extract text with lxml in this scraper program?

I am trying to scrape the text data from a specific element on this page (using scraperwiki) import requests from lxml import html response = requests.get(http://portlandmaps.com/detail.cfm?action=Assessor&propertyid=R246274) tree =…
u'i
  • 1
  • 2