So I have these two scripts:
redditScraper.py
# libraries
import urllib2
import json
# get remote string
url = 'http://www.reddit.com/new.json?sort=new'
response=urllib2.urlopen(url)
# interpret as json
data =…
I've created this function that basically scrapes Technorati for blog posts and URLs to those posts. Btw, I tortured myself to find an API for this, and couldn't find one. I do feel ashamed for this scraper, but there should be an API!…
I'm stuck on this scraper in ScraperWiki. I just want the text from the li-elements in the ul with dir='ltr'. I run this script every week and sentences could be similar to each other, while being a completely new sentence. That's why I want to…
I'm trying to select some content between two HTML comments, but having some trouble getting it right (as seen in "XPath to select between two HTML comments?").
There seems to be a problem when new comments that are on the same line.
My…
This might be a stupid question but I am currently working with scraping twitter by using Scraperwiki. Tho ScraperWiki run-frequency is rather low. Is there a way to force-run ScraperWiki to run more frequently without touching python since my…
For a schoolproject we need to scrape a 'job-finding' website and store this in a DB, and later match with these profiles with companies who are searching people.
On this particular site, all the url's to the pages I need to scrape are in 1 div…
My scraper app is searching a Vimeo URL with a query string attached to it which is
'http://vimeo.com/search?q=angularjs'
When I load that URL on Chrome I can see a number of elements that do not show up with I request() that URL from my scraper.…
I feel like I am missing something very basic here about the limits of python processes. I have a screen scraper that is supposed to go to a password-protected site once a week, filling out a form to update existing records and then grabbing new…
i have to get many urls from a website and then i've to copy these in an excel file.
I'm looking for an automatic way to do that. The website is structured having a main page with about 300 links and inside of each link there are 2 or 3 links that…
I am trying to scrape the text data from a specific element on this page (using scraperwiki)
import requests
from lxml import html
response = requests.get(http://portlandmaps.com/detail.cfm?action=Assessor&propertyid=R246274)
tree =…
I am trying to make a project in dynamic django scraper. I have tested it on linux and it runs properly. When I try to run the command: syndb i get this…