Questions tagged [scraperwiki]

ScraperWiki was an online tool for Screen Scraping.

ScraperWiki ScraperWiki was a platform for writing and scheduling screen scrapers, and for storing the data they generate. It support Ruby, Python and PHP. A later version of the service was called QuickCode, which has also been decommissioned.

"Scraper" refers to screen scrapers, programs that extract data from websites. "Wiki" means that any user with programming experience can create or edit such programs for extracting new data, or for analyzing existing datasets.

68 questions
1
vote
0 answers

wget without an extension

I am downloading data from the CDC. I want to download all .txt files from a given directory. This code worked for 2017 because all download links ended with .txt. In 2016, all links download to a .txt (if you manually click) but there is no such…
Paul
  • 33
  • 2
1
vote
1 answer

What encoding does the ScraperWiki datastore expect?

While writing a scraper on ScraperWiki, I was repeatedly getting this message when trying to save a UTF8-encoded string: UnicodeDecodeError('utf8', ' the \xe2...', 49, 52, 'invalid data') I eventually worked out, by trial and UnicodeDecodeError,…
AP257
  • 89,519
  • 86
  • 202
  • 261
1
vote
1 answer

I am trying to scrape HTML from a site that requires a login but am not getting any data

I am following this tutorial but I can't seem to get any data when I am running the python. I get an HTTP status code of 200 and status.ok returns a true value. Any help would be great. This is what my response looks like in…
mickolasjae
  • 239
  • 3
  • 11
1
vote
1 answer

sqlalchemy.exc.StatementError: invalid literal for int() with base 10 in scraper

I've written a Python 2.7 scraper, but am getting an error when attempting to save my data. The scraper is written in Scraperwiki, but I think that's largely irrelevant to the error I'm getting - saving in Scraperwiki seems to be handled using…
philipnye
  • 352
  • 1
  • 3
  • 16
1
vote
2 answers

Problems with extracting table from PDF

I know there is a few threads on this topic but none of their solutions seems to work for me. I have a table in a PDF document from which I would like to be able to extract information. I can copy and paste the text into textedit and it is legible…
lac
  • 755
  • 10
  • 19
1
vote
0 answers

Crawling infobox section of wikipedia using scraperwiki is giving error

I am newb to scraperwiki.I am trying to get infobox from wiki page using scraperwiki. I get the idea of scraperwiki to crawl wiki pages from below link https://blog.scraperwiki.com/2011/12/how-to-scrape-and-parse-wikipedia/ Code import…
3ppps
  • 933
  • 1
  • 11
  • 24
1
vote
1 answer

Scraperwiki Python Loop Issue

I'm creating a scraper through ScraperWiki using Python, but I'm having an issue with the results I get. I'm basing my code off the basic example on ScraperWiki's docs and everything seems very similar, so I'm not sure where my issue is. For my…
user994585
  • 661
  • 3
  • 13
  • 28
1
vote
1 answer

Scraping a PDF with ScraperWiki and getting an Error of not Defined

I am trying to scrape this PDF with ScraperWiki. The current code gives me an error of name 'data' is not defined but I receive the error on elif int(el.attrib['left']) < 647: data['Neighborhood'] = el.text If i comment that line out i get the…
user3271518
  • 628
  • 3
  • 13
  • 27
1
vote
1 answer

Twitter Scraper giving 420 Error

I am getting the following error while I am using the following code to scrape twitter for tweets: import scraperwiki import simplejson import urllib2 # Change QUERY to your search term of choice. # Examples: 'newsnight', 'from:bbcnewsnight',…
RazorProgrammer
  • 137
  • 3
  • 12
1
vote
5 answers

What is the pythonic way to catch errors and keep going in this loop?

I've got two functions that work just fine, but seem to break down when I run them nested together. def scrape_all_pages(alphabet): pages = get_all_urls(alphabet) for page in pages: scrape_table(page) I'm trying to systematically…
Amanda
  • 12,099
  • 17
  • 63
  • 91
1
vote
0 answers

ASPX requests browser login emulation

I'm trying to do a post on a aspx webpage. I have successfully done the login and tries to get the page content with no luck. After logging in the page goes to a redirect tmp.aspx, then it shows you the main page. My code currently logs in and…
user1553142
  • 237
  • 1
  • 11
  • 21
1
vote
3 answers

What is wrong with the bs4 documentation? I can't run unwrap() sample code

I'm trying to strip out some fussy text from pages like this. I want to preserve the anchored links but lose the breaks and the a.intro. I thought I could use something like unwrap() to strip off layers but I'm getting an error: TypeError:…
Amanda
  • 12,099
  • 17
  • 63
  • 91
1
vote
1 answer

sqlite queries returning errors - can't work out why

Not sure this is a side-effect of a custom function in sqlite, but I was trying to use the queries to power a form. (here's a rough demo http://www.thisisstaffordshire.co.uk/images/localpeople/ugc-images/275796/binaries/GPformMap4.html) Slight…
elksie5000
  • 7,084
  • 12
  • 57
  • 87
1
vote
2 answers

Why isn't my KML feed working with Google Maps anymore?

I'm really confused. I have a KML feed at https://views.scraperwiki.com/run/hackney_council_planning_kml_output/? ...Which worked perfectly fine with Google Maps up until a few weeks…
aendra
  • 5,286
  • 3
  • 38
  • 57
0
votes
2 answers

Foreach loop dieing after one iteration

I've been experimenting with ScraperWiki and yesterday, I could get a list of all lis in the DOM. Now, however, I only run through one iteration. This is my code $html = 'www.blah...' $dom = new…
Echilon
  • 10,064
  • 33
  • 131
  • 217