Questions tagged [scraper]

Synonym of [web-scraping]

Synonym of : Let's [scrape] these tags off the bottom of our shoe

349 questions
0
votes
1 answer

Quotes Messing Up Python Scraper

I am trying to scrape all the data within a div as follows. However, the quotes are throwing me off.
14955 Shady Grove Rd.
Rockville, MD 20850
0
votes
1 answer

Trying update Twitter status with scraper data using Twython. Unsure on what to do

So I have these two scripts: redditScraper.py # libraries import urllib2 import json # get remote string url = 'http://www.reddit.com/new.json?sort=new' response=urllib2.urlopen(url) # interpret as json data =…
0
votes
1 answer

cURL times out when web scraping: "PHP Fatal error: Call to a member function find() on a non-object"

I've created this function that basically scrapes Technorati for blog posts and URLs to those posts. Btw, I tortured myself to find an API for this, and couldn't find one. I do feel ashamed for this scraper, but there should be an API!…
user796443
0
votes
1 answer

Fixing a 'sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type. Try converting types or pickling.'

I'm stuck on this scraper in ScraperWiki. I just want the text from the li-elements in the ul with dir='ltr'. I run this script every week and sentences could be similar to each other, while being a completely new sentence. That's why I want to…
Jerry Vermanen
  • 297
  • 1
  • 2
  • 19
0
votes
1 answer

XPath to select between two HTML comments is not working?

I'm trying to select some content between two HTML comments, but having some trouble getting it right (as seen in "XPath to select between two HTML comments?"). There seems to be a problem when new comments that are on the same line. My…
Thomas
  • 1
0
votes
1 answer

ScraperWiki scrape frequence

This might be a stupid question but I am currently working with scraping twitter by using Scraperwiki. Tho ScraperWiki run-frequency is rather low. Is there a way to force-run ScraperWiki to run more frequently without touching python since my…
0
votes
1 answer

BeautifulSoup4 - All links within 1 div on multiple pages

For a schoolproject we need to scrape a 'job-finding' website and store this in a DB, and later match with these profiles with companies who are searching people. On this particular site, all the url's to the pages I need to scrape are in 1 div…
rockyl
  • 133
  • 3
  • 9
0
votes
1 answer

Chrome shows different html then my RequestJS & CheerioJS app

My scraper app is searching a Vimeo URL with a query string attached to it which is 'http://vimeo.com/search?q=angularjs' When I load that URL on Chrome I can see a number of elements that do not show up with I request() that URL from my scraper.…
user883807
0
votes
1 answer

Scraping Tags Using JSOUP

I'm attempting to extract the values from the following table using JSOUP: …
0
votes
1 answer

How to handle NILs with Anemone / Nokogiri web scraper?

def scrape!(url) Anemone.crawl(url) do |anemone| anemone.on_pages_like %[/events/detail/.*] do |page| show = { headliner: page.doc.at_css('h1.summary').text, openers: page.doc.at_css('.details h2').text …
GN.
  • 8,672
  • 10
  • 61
  • 126
0
votes
1 answer

Python site scraper fails with socket.error 104

I feel like I am missing something very basic here about the limits of python processes. I have a screen scraper that is supposed to go to a password-protected site once a week, filling out a form to update existing records and then grabbing new…
user1046162
  • 55
  • 1
  • 6
0
votes
4 answers

how to get all the urls of a website using a crawler or a scraper?

i have to get many urls from a website and then i've to copy these in an excel file. I'm looking for an automatic way to do that. The website is structured having a main page with about 300 links and inside of each link there are 2 or 3 links that…
giogix
  • 769
  • 1
  • 12
  • 32
0
votes
1 answer

How to extract text with lxml in this scraper program?

I am trying to scrape the text data from a specific element on this page (using scraperwiki) import requests from lxml import html response = requests.get(http://portlandmaps.com/detail.cfm?action=Assessor&propertyid=R246274) tree =…
u'i
  • 1
  • 2
0
votes
1 answer

how to scrape the name of a class form a web page?

This is the HTML code of site I want to scrape:
this is the xpath im using in dynamic django scraper but its not working: //div[@class="ayah…
user4650611
0
votes
1 answer

Django Dynamic Scraper Project does not run on windows even though it works on Linux

I am trying to make a project in dynamic django scraper. I have tested it on linux and it runs properly. When I try to run the command: syndb i get this…
user4650611
Item No. Name Sex Location