Questions tagged [scraperwiki]

ScraperWiki was an online tool for Screen Scraping.

ScraperWiki ScraperWiki was a platform for writing and scheduling screen scrapers, and for storing the data they generate. It support Ruby, Python and PHP. A later version of the service was called QuickCode, which has also been decommissioned.

"Scraper" refers to screen scrapers, programs that extract data from websites. "Wiki" means that any user with programming experience can create or edit such programs for extracting new data, or for analyzing existing datasets.

68 questions

vote

0 answers

wget without an extension

I am downloading data from the CDC. I want to download all .txt files from a given directory. This code worked for 2017 because all download links ended with .txt. In 2016, all links download to a .txt (if you manually click) but there is no such…

curl wget scraperwiki

asked Jul 21 '18 at 00:16

Paul

vote

1 answer

What encoding does the ScraperWiki datastore expect?

While writing a scraper on ScraperWiki, I was repeatedly getting this message when trying to save a UTF8-encoded string: UnicodeDecodeError('utf8', ' the \xe2...', 49, 52, 'invalid data') I eventually worked out, by trial and UnicodeDecodeError,…

python screen-scraping scraperwiki

asked Feb 13 '11 at 14:34

AP257

89,519
86
202
261

vote

1 answer

I am trying to scrape HTML from a site that requires a login but am not getting any data

I am following this tutorial but I can't seem to get any data when I am running the python. I get an HTTP status code of 200 and status.ok returns a true value. Any help would be great. This is what my response looks like in…

python html python-requests lxml scraperwiki

asked May 11 '16 at 21:34

mickolasjae

vote

1 answer

sqlalchemy.exc.StatementError: invalid literal for int() with base 10 in scraper

I've written a Python 2.7 scraper, but am getting an error when attempting to save my data. The scraper is written in Scraperwiki, but I think that's largely irrelevant to the error I'm getting - saving in Scraperwiki seems to be handled using…

sql python-2.7 sqlalchemy scraperwiki

asked Mar 11 '15 at 17:45

philipnye

vote

2 answers

Problems with extracting table from PDF

I know there is a few threads on this topic but none of their solutions seems to work for me. I have a table in a PDF document from which I would like to be able to extract information. I can copy and paste the text into textedit and it is legible…

pdf web-scraping scraperwiki tabula

asked Mar 03 '15 at 09:18

lac

vote

0 answers

Crawling infobox section of wikipedia using scraperwiki is giving error

I am newb to scraperwiki.I am trying to get infobox from wiki page using scraperwiki. I get the idea of scraperwiki to crawl wiki pages from below link https://blog.scraperwiki.com/2011/12/how-to-scrape-and-parse-wikipedia/ Code import…

python wikipedia scraperwiki

asked Jan 27 '15 at 12:02

3ppps

vote

1 answer

Scraperwiki Python Loop Issue

I'm creating a scraper through ScraperWiki using Python, but I'm having an issue with the results I get. I'm basing my code off the basic example on ScraperWiki's docs and everything seems very similar, so I'm not sure where my issue is. For my…

python web-scraping css-selectors lxml scraperwiki

asked Sep 29 '14 at 21:36

user994585

vote

1 answer

Scraping a PDF with ScraperWiki and getting an Error of not Defined

I am trying to scrape this PDF with ScraperWiki. The current code gives me an error of name 'data' is not defined but I receive the error on elif int(el.attrib['left']) < 647: data['Neighborhood'] = el.text If i comment that line out i get the…

python pdf python-3.x scraperwiki

asked Mar 31 '14 at 05:31

user3271518

vote

1 answer

Twitter Scraper giving 420 Error

I am getting the following error while I am using the following code to scrape twitter for tweets: import scraperwiki import simplejson import urllib2 # Change QUERY to your search term of choice. # Examples: 'newsnight', 'from:bbcnewsnight',…

python twitter scraperwiki

asked Apr 25 '13 at 08:07

RazorProgrammer

vote

5 answers

What is the pythonic way to catch errors and keep going in this loop?

I've got two functions that work just fine, but seem to break down when I run them nested together. def scrape_all_pages(alphabet): pages = get_all_urls(alphabet) for page in pages: scrape_table(page) I'm trying to systematically…

python error-handling scraperwiki

asked Nov 25 '12 at 19:34

Amanda

12,099
17
63
91

vote

0 answers

ASPX requests browser login emulation

I'm trying to do a post on a aspx webpage. I have successfully done the login and tries to get the page content with no luck. After logging in the page goes to a redirect tmp.aspx, then it shows you the main page. My code currently logs in and…

python mechanize lxml python-requests scraperwiki

asked Nov 01 '12 at 00:44

user1553142

vote

3 answers

What is wrong with the bs4 documentation? I can't run unwrap() sample code

I'm trying to strip out some fussy text from pages like this. I want to preserve the anchored links but lose the breaks and the a.intro. I thought I could use something like unwrap() to strip off layers but I'm getting an error: TypeError:…

python web-scraping beautifulsoup scraperwiki

asked Oct 26 '12 at 22:32

Amanda

12,099
17
63
91

vote

1 answer

sqlite queries returning errors - can't work out why

Not sure this is a side-effect of a custom function in sqlite, but I was trying to use the queries to power a form. (here's a rough demo http://www.thisisstaffordshire.co.uk/images/localpeople/ugc-images/275796/binaries/GPformMap4.html) Slight…

javascript jquery scraperwiki

asked Sep 04 '12 at 13:15

elksie5000

7,084
12
57
87

vote

2 answers

Why isn't my KML feed working with Google Maps anymore?

I'm really confused. I have a KML feed at https://views.scraperwiki.com/run/hackney_council_planning_kml_output/? ...Which worked perfectly fine with Google Maps up until a few weeks…

validation google-maps kml scraperwiki

asked Apr 19 '12 at 22:53

aendra

5,286
3
38
57

votes

2 answers

Foreach loop dieing after one iteration

I've been experimenting with ScraperWiki and yesterday, I could get a list of all lis in the DOM. Now, however, I only run through one iteration. This is my code $html = 'www.blah...' $dom = new…

php for-loop scraper scraperwiki

asked Mar 06 '12 at 08:48

Echilon

10,064
33
131
217

Prev 1

3 4 5 Next