So I'm relatively new to using XPath and I am having a little difficulty honing in on the exact syntax that I need to use for my specific application. The scraper that I have built is working perfectly fine, (when I use a less complicated path it…
I'm trying to scrape tvtropes with beautifulsoup, but for some reason the data I want is cut out. I'm talking even when I return the entire "soup" from the page. The specific example is this website:…
I want to know what is the difference between the
scraper.exitExecution() and
scraper.stopExecution() and
scraper.finishExecutingProcessor()
I have tried looking in to the java doc, I could not find anything over there. There seems to be no…
I have a function here that tries to grab images from a webpage using cURL. It works for for most websites, but there are some that redirect the script some how. The website that is used as an example in my code below will redirect the script to a…
is there a way or work around to wait for something forever?
E.g.
I'll use fb as example because is the same thing on my site.
Every time that there are new post on my facebook timeline, shows up a panel 'Click here to load the posts'.
Basically,…
I am trying to parse data encoded in HTML format. Example of the string I am trying to parse is:
Simplify the polynomial by combining like terms.
I want to get…
I'm trying to run a go program using LiteIDE x22 but I get the message
C:/Go/bin/go.exe build [C:/Users/admins/Desktop/desktp/worm_scraper-master]
worm_scraper.go:11:2: cannot find package "github.com/codegangsta/cli" in any of:
…
I have been trying to figure out how to do this without a ridiculous amount of code for the past few days, I can not find anything on it, google, Stack Overflow, etc.
I am building a very advanced web scraper and I would like for the output to be in…
I'm using casperjs to scrape a site. I setup a function which stores a string into a variable named images (shown below) and it works great.
images = casper.getElementsAttribute('.search-product-image','src');
I then call that variable in fs so I…
I modified the script below to get all links on the $url set in the code.
I seems to work to some extent, it is getting all pages URL, however not parsing all pages. It is parsing only the first pages and repeat the result for the rest.
Can someone…
I have written a Python program that scrapes information from a website using regex. My goal is to create a cron job to run this scraper each month.
I have gone into the Linux terminal, typed in crontab -e, and added to the bottom of the crontab…
I am running this website www.miswag.net which is highly dependent on Facbeook. When I share my site on Facebook, I get a "403 Forbidden", here's Facebook's debugger output when I try to scrape my site:…
I'm having trouble using the Facebook sharer with my site www.moncorpsetmoi.com.
The debugger says Can't download: Could not retrieve data from URL.
Any help, any ideas?
I am fairly new to programming, and this is my first project after reading various guides. I am trying to scrape data from the Yahoo Finance Key Statistics page and Financial Statements (ie. http://finance.yahoo.com/q/ks?s=GOOG+Key+Statistics). The…
I'm getting:
exceptions.TypeError: not all arguments converted during string formatting
from this:
cursor.execute("SELECT * FROM `item` WHERE `url` = %s", (urljoin( base_url, item_url ) ))
Syntax seems fine - any ideas?