Questions tagged [scraper]

Synonym of [web-scraping]

Synonym of : Let's [scrape] these tags off the bottom of our shoe

349 questions
0
votes
1 answer

Extracting text nodes or elements with relative XPath in Scrapy

So I'm relatively new to using XPath and I am having a little difficulty honing in on the exact syntax that I need to use for my specific application. The scraper that I have built is working perfectly fine, (when I use a less complicated path it…
Joey Orlando
  • 1,408
  • 2
  • 17
  • 33
0
votes
1 answer

Why can't I access all the data on this page?

I'm trying to scrape tvtropes with beautifulsoup, but for some reason the data I want is cut out. I'm talking even when I return the entire "soup" from the page. The specific example is this website:…
0
votes
1 answer

What is the difference between the exitExecution() and stopExecution() in Webharvest Scraper class

I want to know what is the difference between the scraper.exitExecution() and scraper.stopExecution() and scraper.finishExecutingProcessor() I have tried looking in to the java doc, I could not find anything over there. There seems to be no…
codeMan
  • 5,730
  • 3
  • 27
  • 51
0
votes
1 answer

cURL Image scraper gets redirected?

I have a function here that tries to grab images from a webpage using cURL. It works for for most websites, but there are some that redirect the script some how. The website that is used as an example in my code below will redirect the script to a…
bw872
  • 167
  • 2
  • 12
0
votes
1 answer

Wait for something forever CasperJS/PhantomJS

is there a way or work around to wait for something forever? E.g. I'll use fb as example because is the same thing on my site. Every time that there are new post on my facebook timeline, shows up a panel 'Click here to load the posts'. Basically,…
Rodrigo Pereira
  • 1,834
  • 2
  • 17
  • 35
0
votes
1 answer

HTML scraping - R scrapR

I am trying to parse data encoded in HTML format. Example of the string I am trying to parse is: Simplify the polynomial by combining like terms. \"3x+12-11x+14\" I want to get…
0
votes
1 answer

how to have go find packages online?

I'm trying to run a go program using LiteIDE x22 but I get the message C:/Go/bin/go.exe build [C:/Users/admins/Desktop/desktp/worm_scraper-master] worm_scraper.go:11:2: cannot find package "github.com/codegangsta/cli" in any of: …
0
votes
0 answers

Python Web Scraper (URL-Sub_URL Output)

I have been trying to figure out how to do this without a ridiculous amount of code for the past few days, I can not find anything on it, google, Stack Overflow, etc. I am building a very advanced web scraper and I would like for the output to be in…
Windows65
  • 57
  • 1
  • 7
0
votes
2 answers

Scraping with casperjs -- Not sure how to handle empty div

I'm using casperjs to scrape a site. I setup a function which stores a string into a variable named images (shown below) and it works great. images = casper.getElementsAttribute('.search-product-image','src'); I then call that variable in fs so I…
critic
  • 69
  • 1
  • 10
0
votes
1 answer

PHP simple_html_dom to parse links from paginated pages

I modified the script below to get all links on the $url set in the code. I seems to work to some extent, it is getting all pages URL, however not parsing all pages. It is parsing only the first pages and repeat the result for the rest. Can someone…
Spykey
  • 1
0
votes
1 answer

How do I automate my scraper program that I wrote in python to run monthly?

I have written a Python program that scrapes information from a website using regex. My goal is to create a cron job to run this scraper each month. I have gone into the Linux terminal, typed in crontab -e, and added to the bottom of the crontab…
0
votes
2 answers

Facebook can't scrape my site correctly - PHP Site

I am running this website www.miswag.net which is highly dependent on Facbeook. When I share my site on Facebook, I get a "403 Forbidden", here's Facebook's debugger output when I try to scrape my site:…
Ammar
  • 3
  • 3
0
votes
1 answer

Facebook open graph debugger : error 200 : can't download

I'm having trouble using the Facebook sharer with my site www.moncorpsetmoi.com. The debugger says Can't download: Could not retrieve data from URL. Any help, any ideas?
0
votes
1 answer

Yahoo Finance - Python Web Scraper - Key Statistics and Financial Statements

I am fairly new to programming, and this is my first project after reading various guides. I am trying to scrape data from the Yahoo Finance Key Statistics page and Financial Statements (ie. http://finance.yahoo.com/q/ks?s=GOOG+Key+Statistics). The…
J G
  • 1
  • 1
  • 1
0
votes
1 answer

Any help on cursor.execute and urljoin?

I'm getting: exceptions.TypeError: not all arguments converted during string formatting from this: cursor.execute("SELECT * FROM `item` WHERE `url` = %s", (urljoin( base_url, item_url ) )) Syntax seems fine - any ideas?
user182216
  • 25
  • 5