Questions tagged [scraperwiki]

ScraperWiki was an online tool for Screen Scraping.

ScraperWiki ScraperWiki was a platform for writing and scheduling screen scrapers, and for storing the data they generate. It support Ruby, Python and PHP. A later version of the service was called QuickCode, which has also been decommissioned.

"Scraper" refers to screen scrapers, programs that extract data from websites. "Wiki" means that any user with programming experience can create or edit such programs for extracting new data, or for analyzing existing datasets.

68 questions

votes

1 answer

Is there a way to delete a view on scraperwiki?

Is there a way to delete a view on scraperwiki? I can't find a way to do that anywhere on the site.

scraperwiki

asked Jan 31 '12 at 21:33

killdash9

2,314
2
23
17

votes

0 answers

I'm only scraping the first element of each page using BeautifulSoup, my goal is to scrape all elements within the page, What am I doing wrong?

I'm trying to to scrape the public contact info of all the persons in each page of the website so I build 3 functions, one to modify the URL, one to extract the source code from it using BeautifulSoup and one to transform it and finally get the…

python beautifulsoup scraperwiki

asked Jan 28 '22 at 00:07

Jorge Ceja Valencia

votes

1 answer

Scraping with Invoke-WebRequest

We are migrating an asp.net intranet to SharePoint and automating the conversion via PowerShell. We only want to scrap links from within the DIV tag with a classname 'topnav'. Not all the links on the page $url = "http://intranet.company.com" $page…

powershell scraperwiki

asked May 17 '16 at 23:47

user2019423

votes

1 answer

How to selectively scrape html with repeated class IDs

I am new to python and have searched stackoverflow in vain for an answer that I can understand. Thanks in advance for any help or advice you can give. I am trying to scrape information on price and location from a housing sales website, i.e. the…

python html xpath scraperwiki

asked Dec 04 '15 at 15:55

JER

votes

1 answer

Using scraperwiki for pdf-file on disk

I am trying to get some data out of a pdf document using scraperwiki for pyhon. It works beautifully if I download the file using urllib2 like so: pdfdata = urllib2.urlopen(url).read() xmldata = scraperwiki.pdftoxml(pdfdata) root =…

python-2.7 pdf scraperwiki

asked May 26 '15 at 16:43

w_a_s

votes

1 answer

A table of scraperwiki.sqlite isn't found

I have a script in Ruby which uses scraperwiki gem. In the directory of this script there's file titled scraperwiki.sqlite. items.each do |x| if ScraperWiki.select("* from data where .... { x['key123']}'").empty? …

ruby sqlite rubygems scraperwiki

asked Sep 09 '14 at 19:05

Incerteza

32,326
47
154
261

votes

2 answers

lxml not working with django, scraperwiki

I'm working on a django app that goes through Illinois' General Assembly website to scrape some pdfs. While deployed on my desktop it works fine until urllib2 times out. When I try to deploy on my Bluehost server, the lxml part of the code throws up…

django lxml scraperwiki

asked Jun 03 '14 at 01:59

Funky_Monkey

votes

1 answer

Installing Scraperwiki for Python generates an error pdftohtml not found

I have been trying to install Scraperwiki module for Python. However, it generates the error: ""UserWarning: Local Scraperlibs requires pdftohtml, but pdftohtml was not found in the PATH. You probably need to install it". I looked into poppler as…

python poppler scraperwiki pdf-to-html

asked May 15 '14 at 09:56

user3622138

votes

1 answer

Performance Optimization of scraping code

I am studying web scraping for big data, so I wrote the following code to take some information from a local server on our campus. It works fine but I think the performance is very slow; each record takes 0.91s to get stored in the database. What…

python web-scraping scraperwiki

asked May 11 '14 at 09:28

Mohammad Abu Musa

1,117
2
10
32

votes

1 answer

How to add this data to database in scraperwiki

import scraperwiki import urllib2, lxml.etree url = 'http://eci.nic.in/eci_main/statisticalreports/SE_1998/StatisticalReport-DEL98.pdf' pdfdata = urllib2.urlopen(url).read() xmldata = scraperwiki.pdftoxml(pdfdata) root =…

python pdf screen-scraping scraperwiki

asked May 07 '14 at 08:51

Krishna Prasanth

votes

1 answer

Fixing a 'sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type. Try converting types or pickling.'

I'm stuck on this scraper in ScraperWiki. I just want the text from the li-elements in the ul with dir='ltr'. I run this script every week and sentences could be similar to each other, while being a completely new sentence. That's why I want to…

python sqlite scraper scraperwiki screen-scraping

asked Nov 06 '13 at 10:40

Jerry Vermanen

votes

1 answer

ScraperWiki scrape frequence

This might be a stupid question but I am currently working with scraping twitter by using Scraperwiki. Tho ScraperWiki run-frequency is rather low. Is there a way to force-run ScraperWiki to run more frequently without touching python since my…

javascript scraper scraperwiki

asked Oct 25 '13 at 09:54

user2906393

votes

0 answers

error importing python library in scraperwiki

I am using scraperwiki to run some code in Python. However, when I run this code I am getting this error: Traceback (most recent call last): File "./code/scraper", line 4, in from scrapemark import scrape ImportError: No module named…

python importerror scraperwiki

asked Aug 21 '13 at 20:21

user2662750

votes

1 answer

Scraperwiki: how to save data into one cell in table

Here is my code for the scraper that is extracting the URL and corresponding comments from that particular page: import scraperwiki import lxml.html from BeautifulSoup import BeautifulSoup import urllib2 import re for num in range(1,2): …

python sql beautifulsoup scraperwiki

asked Aug 08 '13 at 01:12

user2662750

votes

1 answer

How to extract text with lxml in this scraper program?

I am trying to scrape the text data from a specific element on this page (using scraperwiki) import requests from lxml import html response = requests.get(http://portlandmaps.com/detail.cfm?action=Assessor&propertyid=R246274) tree =…

python lxml scraper scraperwiki

asked Jul 24 '13 at 17:56

u'i

Prev 1 2

4 5 Next