Highest Voted 'scraper' Questions

0

votes

2 answers

Unable to scrape website: URL returned a bad HTTP response code

I noticed that this has been asked before, but no one else has yet to receive an answer, so I'll try my best to ask too. In the last several months, my Wordpress website, http://geekvision.tv/ , has been undetectable by Facebook's debugger. I…

asked May 09 '13 at 07:39

Zach Hurst

1
1
1

0

votes

1 answer

PHP: imdb scraper poster

i have an iMDb-Scraper from another site. It worked very well and now iMDb changed it's html-output and the regular expression doesn't find the poster anymore. I'm a noob at regex, so maybe someone can help me this is the line $arr['poster'] =…

php scraper

asked Apr 06 '13 at 14:59

Bubbleboy

71
9

0

votes

1 answer

How to get a clean result when scraping a data from website using scrapy

I am new in python and I am trying to scrape a data from yellow pages. I was able to scrape it but I get a messed result. This was the result i got: 2013-03-24 20:26:47+0800 [scrapy] INFO: Scrapy 0.14.4 started (bot: eyp) 2013-03-24 20:26:47+0800…

python-2.7 web-scraping scrapy scrape scraper

asked Mar 24 '13 at 12:52

user2176372

0

votes

2 answers

Prevent or delete duplicates in textscraper?

I have a code that parses through text files in a folder, and saves a predefined number of words around certain search words. For example, it looks for words such as "date" and "year". If it finds both in the same sentence it will save the sentence…

ruby search duplicates scraper

asked Mar 21 '13 at 13:54

Seeb

199
4
16

0

votes

0 answers

javascript to finding and listing images of website?

I’d like to do some hygiene on a bloated images folder/directory for a website of mine. I’m a grade just above novice working with javascript, it seems like it might be possible achieve a solution using javascript… The solution I’m searching for…

javascript image url scraper

asked Feb 28 '13 at 22:01

Alistar Spaceman

1
1

0

votes

1 answer

Xpath content not saved

It might just be an idiotic bug in the code that I haven't yet discovered, but it's been taking me quite some time: When parsing websites using nokogiri and xpath, and trying to save the content of the xpaths to a .csv file, the csv file has empty…

ruby xpath nokogiri scraper

asked Nov 13 '12 at 10:51

Seeb

199
4
16

0

votes

1 answer

Error loading GoutteClient when using Behat/Mink

I'm trying to use Behat/Mink in order to load a website. I've used Composer for the installation, this is my composer.json: { "require": { "behat/mink": "*", "behat/mink-goutte-driver": "*", "behat/mink-selenium-driver":…

web-scraping scraper mink goutte

asked Oct 20 '12 at 19:37

rfc1484

9,441
16
72
123

0

votes

1 answer

ScraperWiki: How to save html so it only gets loaded once

When I execute a scraper, it loads the url using this method: $html = scraperWiki::scrape("foo.html"); So every time I add new code to the scraper and want to try it, it loads again the html, which takes a fair amount of time. Is there anyway…

php scraper scraperwiki

asked Sep 08 '12 at 21:35

rfc1484

9,441
16
72
123

0

votes

1 answer

How to download image and save image name based on URL?

How do I download all images from a web page and prefix the image names with the web page's URL (all symbols replaced with underscores)? For example, if I were to download all images from http://www.amazon.com/gp/product/B0029KH944/, then the main…

curl web-scraping wget scrape scraper

asked Sep 03 '12 at 09:19

thdoan

18,421
1
62
57

0

votes

1 answer

Html Tag counting - Rate of Change formula

I've been trying to a find a statistics-esque formula for calculating the rate of change for html tags which are either added or removed from various websites. So, for example, with the scraper I'm writing, I obtain the initial tag count and then…

php scraper rate

asked Aug 14 '12 at 17:51

zeboidlund

9,731
31
118
180

0

votes

2 answers

A good methodology for obtaining the number of html tags for a page

I'm looking for a good way to do this: my current method seems to not allow depths of searches beyond 30-40, even after editing the php.ini settings in hopes to increase default execution time as well as max memory usage. Basically, as soon as the…

php dom curl depth scraper

asked Aug 13 '12 at 17:41

zeboidlund

9,731
31
118
180

0

votes

2 answers

PHP scrape remote images that do not have extensions

I've developed an image scraper that will scrape specific images from remote sites and display them upon pasting into a text field. The logic includes finding images that end in .jpg .jpeg . png etc. I'm running into an issue where alot of sites…

php javascript jquery image scraper

asked Aug 01 '12 at 19:47

Chris Favaloro

79
1
12

0

votes

1 answer

Facebook Open Graph Scraping URL

I'm trying to develop 'want' and 'own' buttons. If I use the Facebook debug tool it tells me the final URL is the home page and this has happened because the page has been redirected, which I don't want. I want the fetched URL to be scraped. As a…

facebook url redirect facebook-opengraph scraper

asked Jul 23 '12 at 17:00

Matt

25
1
5

0

votes

2 answers

Scripted Browser Scapper

What can I use to achieve the following, script a browser or otherwise make a request to the server, login, browse the site, eg. find links and navigate to those links. For now, since I am into NodeJS, I was looking at node.io. It allows you to…

node.js scraper node.io

asked Jul 22 '12 at 04:35

Jiew Meng

84,767
185
495
805

0

votes

2 answers

Scraperwiki scrape query: using lxml to extract links

I suspect this is a trivial query but hope someone can help me with a query I've got using lxml in a scraper I'm trying to build. https://scraperwiki.com/scrapers/thisisscraper/ I'm working line-by-line through the tutorial 3 and have got so far…

python-2.7 lxml scraper scraperwiki

asked Jul 09 '12 at 17:59

elksie5000

7,084
12
57
87

Questions tagged [scraper]