Highest Voted 'scraper' Questions

0

votes

1 answer

Extracting text nodes or elements with relative XPath in Scrapy

So I'm relatively new to using XPath and I am having a little difficulty honing in on the exact syntax that I need to use for my specific application. The scraper that I have built is working perfectly fine, (when I use a less complicated path it…

asked Dec 15 '14 at 16:56

Joey Orlando

1,408
2
17
33

0

votes

1 answer

Why can't I access all the data on this page?

I'm trying to scrape tvtropes with beautifulsoup, but for some reason the data I want is cut out. I'm talking even when I return the entire "soup" from the page. The specific example is this website:…

python beautifulsoup scraper

asked Nov 13 '14 at 18:45

Austin Capobianco

243
4
18

0

votes

1 answer

What is the difference between the exitExecution() and stopExecution() in Webharvest Scraper class

I want to know what is the difference between the scraper.exitExecution() and scraper.stopExecution() and scraper.finishExecutingProcessor() I have tried looking in to the java doc, I could not find anything over there. There seems to be no…

java web-scraping screen-scraping scraper webharvest

asked Sep 17 '14 at 10:09

codeMan

5,730
3
27
51

0

votes

1 answer

cURL Image scraper gets redirected?

I have a function here that tries to grab images from a webpage using cURL. It works for for most websites, but there are some that redirect the script some how. The website that is used as an example in my code below will redirect the script to a…

php redirect curl web-crawler scraper

asked Sep 16 '14 at 15:16

bw872

167
2
12

0

votes

1 answer

Wait for something forever CasperJS/PhantomJS

is there a way or work around to wait for something forever? E.g. I'll use fb as example because is the same thing on my site. Every time that there are new post on my facebook timeline, shows up a panel 'Click here to load the posts'. Basically,…

javascript web-scraping phantomjs casperjs scraper

asked Sep 04 '14 at 16:07

Rodrigo Pereira

1,834
2
17
35

0

votes

1 answer

HTML scraping - R scrapR

I am trying to parse data encoded in HTML format. Example of the string I am trying to parse is: Simplify the polynomial by combining like terms. $\"3x+12-11x+14\"$ I want to get…

r web screen-scraping scraper

asked Jun 28 '14 at 18:35

user3763914

11
3

0

votes

1 answer

how to have go find packages online?

I'm trying to run a go program using LiteIDE x22 but I get the message C:/Go/bin/go.exe build [C:/Users/admins/Desktop/desktp/worm_scraper-master] worm_scraper.go:11:2: cannot find package "github.com/codegangsta/cli" in any of: …

go web-scraping scraper

asked Jun 27 '14 at 16:19

user3783907

13
2

0

votes

0 answers

Python Web Scraper (URL-Sub_URL Output)

I have been trying to figure out how to do this without a ridiculous amount of code for the past few days, I can not find anything on it, google, Stack Overflow, etc. I am building a very advanced web scraper and I would like for the output to be in…

python tree output depth scraper

asked Jun 22 '14 at 06:36

Windows65

57
1
7

0

votes

2 answers

Scraping with casperjs -- Not sure how to handle empty div

I'm using casperjs to scrape a site. I setup a function which stores a string into a variable named images (shown below) and it works great. images = casper.getElementsAttribute('.search-product-image','src'); I then call that variable in fs so I…

casperjs scraper

asked Jun 12 '14 at 15:32

critic

69
1
10

0

votes

1 answer

PHP simple_html_dom to parse links from paginated pages

I modified the script below to get all links on the $url set in the code. I seems to work to some extent, it is getting all pages URL, however not parsing all pages. It is parsing only the first pages and repeat the result for the rest. Can someone…

php html parsing simple-html-dom scraper

asked Apr 21 '14 at 09:32

Spykey

1

0

votes

1 answer

How do I automate my scraper program that I wrote in python to run monthly?

I have written a Python program that scrapes information from a website using regex. My goal is to create a cron job to run this scraper each month. I have gone into the Linux terminal, typed in crontab -e, and added to the bottom of the crontab…

python linux automation cron scraper

asked Mar 31 '14 at 18:59

user3482411

1

0

votes

2 answers

Facebook can't scrape my site correctly - PHP Site

I am running this website www.miswag.net which is highly dependent on Facbeook. When I share my site on Facebook, I get a "403 Forbidden", here's Facebook's debugger output when I try to scrape my site:…

facebook scraper

asked Mar 27 '14 at 09:22

Ammar

3
3

0

votes

1 answer

Facebook open graph debugger : error 200 : can't download

I'm having trouble using the Facebook sharer with my site www.moncorpsetmoi.com. The debugger says Can't download: Could not retrieve data from URL. Any help, any ideas?

facebook debugging http-status-code-403 facebook-sharer scraper

asked Mar 06 '14 at 15:08

user3388758

1
1

0

votes

1 answer

Yahoo Finance - Python Web Scraper - Key Statistics and Financial Statements

I am fairly new to programming, and this is my first project after reading various guides. I am trying to scrape data from the Yahoo Finance Key Statistics page and Financial Statements (ie. http://finance.yahoo.com/q/ks?s=GOOG+Key+Statistics). The…

python scraper yahoo-finance

asked Feb 04 '14 at 23:47

J G

1
1
1

0

votes

1 answer

Any help on cursor.execute and urljoin?

I'm getting: exceptions.TypeError: not all arguments converted during string formatting from this: cursor.execute("SELECT * FROM `item` WHERE `url` = %s", (urljoin( base_url, item_url ) )) Syntax seems fine - any ideas?

python mysql scraper

asked Jan 19 '14 at 16:07

user182216

25
5

Questions tagged [scraper]