Questions tagged [scraper]

Synonym of [web-scraping]

Synonym of : Let's [scrape] these tags off the bottom of our shoe

349 questions
0
votes
1 answer

What is the fastest way to download source code from a web page in python with proxy?

I'm already using urllib2 to get the pages using proxy, but it's taking way too long, and I know that using proxy takes longer, but still is taking too long comparing if I test the proxy in firefox or ie. Thanks.
Barbara
  • 1
  • 1
0
votes
1 answer

Facebook Scraper cannot retrieve image

In my page, I have this: Which when executed is rendered like this:
Noor
  • 19,638
  • 38
  • 136
  • 254
0
votes
1 answer

E107 redirect facebook scraping error

Here's my .htaccess file: RewriteCond %{REQUEST_METHOD} POST RewriteCond %{REQUEST_URI} !^/?(usersettings\.php|page\.php|news\.php|signup\.php|admin/|plugins/forum/|plugins/.*/.*config\.php) RewriteCond %{HTTP_REFERER} !^http://(.*\.)?lf1medsoc\.com…
Paris Char
  • 477
  • 4
  • 17
0
votes
1 answer

Scraping URLs from a node.js data stream on the fly

I am working with a node.js project (using Wikistream as a basis, so not totally my own code) which streams real-time wikipedia edits. The code breaks each edit down into its component parts and stores it as an object (See the gist at…
roy
  • 3,706
  • 6
  • 30
  • 53
0
votes
1 answer

Screen Scraping with HTTP Headers Issue - I Think

I've been trying to figure this one out for about a week now and just can't come up with a good solution. So, I figured I would see if anyone could help me out. Here's one of the links that I'm trying to…
0
votes
1 answer

Facebook scraper error reading contents

Facebook scraper throws some weird stuff when reading the contents of my page... Page URL: http://www.protagora.hr/Stranica/O-nama/9/ Scrape debug output:…
StjepanV
  • 167
  • 2
  • 14
-1
votes
1 answer

facebook scraper doesn't like some of my pages

i have a web shop build on prestashop. an i am trying to integrate the Like button. and i observed that on some pages it scrapes out a thumbnail on some other pages it does not. i found out the page that shows us exactly what the scraper sees so the…
-1
votes
1 answer

I'm unable to scrape individual profile

I found a php script to scrape company profile pages from linkedin here https://stackoverflow.com/questions/42329819/how-can-i-scrape-linkedin-company-pages-with-curl-and-php-no-csrf-token-found-i#= I replaced the UserAgenet with my own. it…
Wcan
  • 840
  • 1
  • 10
  • 31
-1
votes
2 answers

LinkedIn Scraper: How do I convert a list of company names into LinkedIn URLs

I have a LinkedIn scraper (built in Python) already set up which takes a list of company URLs as input, and outputs all the information about that company (such as location, website, and size (number of employees)). The problem is the input: it…
Mark Sonn
  • 848
  • 8
  • 22
-1
votes
1 answer

Selenium Python Firefox vs PhantomJS

I am writing a webscraper using selenium on python. I wrote the script to pull information from one site, then go to another and pull different information (emails). When I run the script with browser = webdriver.Firefox(), the script behaves…
Jay Ocean
  • 273
  • 1
  • 3
  • 14
-1
votes
1 answer

How to webscrape webpage with submit form in entrance?

I've been trying to figure out how to webscrape this page: sick.com I can't figure it out. I've been trying Visual Web Ripper but it doesn't pass the submit form, because it doesn't remember the cookie. Do you have any ideas? Sick.com is ok with me…
-1
votes
1 answer

facebook scraper stops reading my meta data

Possible Duplicate: Facebook won’t share a link to my site I have 2 websites that fail to show an image when pasted into facebook. So I went to the facebook object debugger and compare what the scraper sees to what view source…
-1
votes
1 answer

How to scrape products from site with ruby/anemone/nokogiri

Is it possible to scrape the products from a ecommerce site using the anemone and nokogiri libs in ruby? I understand how to pull the data I need from each product page using nokogiri but I can't figure out how to make anemone/nokogiri crawl the…
Dan
  • 641
  • 9
  • 25
-2
votes
1 answer

Programmatically enter a pin number and press button

I am working on a project where I have inherited some code that logs into a website using python's 'requests' library and scrapes the site for content. The 'login' code utilizes a backend URL to POST credentials to an endpoint. (Works fine) There is…
Joe
  • 512
  • 1
  • 3
  • 16
-2
votes
1 answer

Find Hidden Webpage Url Address

I am trying to find the full webpage address for a form generated by a website. The website is https://treasurer.maricopa.gov/Parcel/?Parcel=50427029 Once you get there I want to see the web address for the Redemption Statement. You click on it…
Taylor29
  • 65
  • 3