Certain websites require us to have a particular IP address to display certain information eg. ads for country X. I would like to know if it is possible to use a proxy (preferably ruby one) with my ruby script @scraperwiki to get the results as if I was in that country X. Right now the script gets the results in the UK and if I use an HTTP proxy I can see the website that I want to retrieve the data from correctly. The problem is Scraperwiki does not return the webpage like if it was in country X
Asked
Active
Viewed 264 times
0
-
I would like an alternative to using a webbased proxy because these are too slow. Instead of doing `doc = Nokogiri::HTML(open(queryurl)) would do doc = Nokogiri::HTML(open(http://webproxycountryX.xx? website=queryurl))` – Pedro Pereira Feb 16 '13 at 14:55
-
2Note that [tag:web-scraping] is usually not considered to be data mining. The term data mining is (properly) used for advanced statistical data analysis, not the collection of data. Please use the more appropriate tags, this will get you better answers. – Has QUIT--Anony-Mousse Feb 16 '13 at 15:01
1 Answers
2
Yes. You should be using Mechanize:
require 'mechanize'
agent = Mechanize.new
agent.set_proxy host, port
page = agent.get url
Now call page#search
or page#at
just like you would with your Nokogiri document.

the Tin Man
- 158,662
- 42
- 215
- 303

pguardiario
- 53,827
- 19
- 119
- 159
-
Wow, much more efficient, although the results seem to come from the UK still.. Already contacted sw.. – Pedro Pereira Feb 16 '13 at 19:19
-
Unfortunately this solution does not seem to be working with Scraperwiki – Pedro Pereira Mar 20 '13 at 14:24
-
Scraperwiki wasnt fully implementing mechanize. Only the next version will. – Pedro Pereira Apr 27 '13 at 22:06
-
I've never seen the appeal of scraperwiki. Just set up a free-tier ec2 instance. You won't have to worry about missing libraries or someone changing the framework around and breaking your scripts. – pguardiario Apr 27 '13 at 23:23