Questions tagged [scraper]

Synonym of [web-scraping]

Synonym of : Let's [scrape] these tags off the bottom of our shoe

349 questions
4
votes
3 answers
4
votes
1 answer

Scraping sites with javascript screen delay

I'm attempting to scrape a site that has a split second javascript delay. I'm currently using python for scraping. Whenever I 'get' the page, the javascript delay has not finished and is has not completely loaded the new dom yet. How would I scrape…
4
votes
1 answer

Python + Mechanize not working with Delicious

I'm using Mechanize and Beautiful soup to scrape some data off Delicious from mechanize import Browser from BeautifulSoup import BeautifulSoup mech = Browser() url = "http://www.delicious.com/varunsrin" page = mech.open(url) html =…
varunsrin
  • 860
  • 2
  • 15
  • 24
4
votes
1 answer

PHP cURL - how to emulate exactly same request like user?

I am trying to make a website scraper, but the website is acting diferrently, than normal request via browser. How can i make perfect cURL reguest, that the website will not filter it and block it? Any help would be appriciated. $curl_handle =…
Tadeáš Jílek
  • 2,813
  • 2
  • 19
  • 32
4
votes
2 answers

Is there any way to change the log message format in scrapy?

I would like to modify the scrapy log messages to contain user id at the beginning of it. for example, instead of this 2015-03-03 17:09:34+0530 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,…
4
votes
1 answer

How would I scrape the JS-generated data on this webpage?

This past week, there was the launch of a new tool called #Homescreen that allows people to share a screenshot of the apps that they have on their iPhone home screen. For example: https://homescreen.is/iamfinnym I'd like to build a scraper that…
grautur
  • 29,955
  • 34
  • 93
  • 128
4
votes
2 answers

Manipulating BeautifulSoup's ResultSet list object

I am trying to extract 2 pieces of data: 1) The value of the option element's "value" attribute (ie "01000.html" below). 2) The string that is within the tags (ie "Alabama"). There is limited information on the ResultSet list…
d8aninja
  • 3,233
  • 4
  • 36
  • 60
4
votes
2 answers

beautifulsoup and mechanize to get ajax call result

hi im building a scraper using python 2.5 and beautifulsoup but im stuble upon a problem ... part of the web page is generating after user click on some button, whitch start an ajax request by calling specific javacsript function using proper…
nabizan
  • 3,185
  • 5
  • 26
  • 38
4
votes
1 answer

Scrapy Python Craigslist Scraper

I am trying to scrape Craigslist classifieds using Scrapy to extract items that are for sale. I am able to extract date, post title, and post url but am having trouble extracting price. For some reason the current code extracts all of the prices,…
Joe Barreca
  • 49
  • 2
  • 7
4
votes
1 answer

Server-sided issues when scraping with Node JS Cheerio module?

I am trying to follow this thread here: How can one parse HTML server-side with Meteor? Unfortunately I get the following errors when doing so: Uncaught Error: Can't make a blocking HTTP call from the client; callback required. Here is the…
TheProofIsTrivium
  • 768
  • 2
  • 11
  • 25
3
votes
2 answers

Using Ruby/Mechanize to select next element after selected element

I was unable to find this question specifically, hopefully I'm not wrong about it being a new variation on an older question. I'm hoping to be able to select the table after the (inconsistent) p.red element text(), where the 'p' does not contain the…
3
votes
3 answers

How do I programmatically get Google SEO/Search Rank information? API or Scraper?

I'm trying to find a programmatic way to get 2 values: a domain's position in the Google results for a specific term the number of Google results for that term Currently my client is using some scraper software, but there's a manual step…
dylanized
  • 3,765
  • 6
  • 32
  • 44
3
votes
2 answers

Long running PHP scraper returns 500 Internal Error

mostly I find the answers on my questions on google, but now i'm stuck. I'm working on a scraper script, which first scrapes some usernames of a website, then gets every single details of the user. there are two scrapers involved, the first goes…
z3r0
  • 45
  • 8
3
votes
3 answers

How do you extract an embedded attribute value from a previous attribute value in an XPath query?

I'm trying to "select" the link from the onclick attribute in the following portion of html but can't get any further than the…
emish
  • 2,813
  • 5
  • 28
  • 34
3
votes
2 answers

Python issue: TypeError: unhashable type: 'slice' during web scraping

I am attempting to scrape some info from a website. I was able to successfully scrape the text that i was looking for, but when I try to create a function to append the texts together, i get a TypeError of an unhashable type. Do you know what may…
pynewbee
  • 665
  • 3
  • 9
  • 19
1 2
3
23 24