Questions tagged [scraper]

Synonym of [web-scraping]

Synonym of : Let's [scrape] these tags off the bottom of our shoe

349 questions
3
votes
3 answers

How do create a HTML scraper in PHP and get it working properly?

Please HELP! :( I am looking to develop a PHP Script to do the following: Scrap a remote HTML page and extract selected data (e.g. particular table/div) Use extracted data and save it into a Database (e.g. MySql) Anyone can help out? Thanks and…
user429384
  • 31
  • 1
  • 2
3
votes
1 answer

BeautifulSoup MemoryError When Opening Several Files in Directory

Context: Every week, I receive a list of lab results in the form of an html file. Each week, there are about 3,000 results with each set of results having between two and four tables associated with them. For each result/trial, I only care about…
JohnR4785
  • 73
  • 3
3
votes
3 answers

Facebook like on demand meta content scraper

you guys ever saw that FB scrapes the link you post on facebook (status, message etc.) live right after you paste it in the link field and displays various metadata, a thumb of the image, various images from the a page link or a video thumb from a…
Toby
  • 2,720
  • 5
  • 29
  • 46
3
votes
1 answer

Beautiful Soup nested div (Adding extra function)

I am trying to extract Company Name, address, and zipcode from [www.quicktransportsolutions.com][1]. I have written the following code to scrawl the site and return the information I need. import requests from bs4 import BeautifulSoup def…
3
votes
0 answers

Scraping (multiple) web-page(s) and detect changes

I'm currently trying to write down a concept how I could solve following thing: In Java I'm currently scraping a web-page with articles. If any of these articles get available or change somehow it should give me an alert. The scraping of all the…
pythoniosIV
  • 237
  • 5
  • 18
3
votes
5 answers

Scrape data from HTML pages using Java, output to database

I need to know how to create a scraper (in Java) to gather data from HTML pages and output to a database...do not have a clue where to start so any information you can give me on this would be great. Also, you can't be too basic or simple…
Tanith
  • 31
  • 1
  • 1
  • 2
3
votes
1 answer

Rails - render :content_type has no effect

I'm developing a Ruby/Rails app which scrapes another website and renders an RSS feed with the data. Because this app is built on Heroku, I am generating the RSS feed via a controller, rather than dumping it to the file-system and serving it as an…
Daniel B.
  • 1,650
  • 1
  • 19
  • 40
3
votes
5 answers

How to extract the text between some anchor tags?

I need to extract the name of the artists from an HTML page. Here's a snippet of the page:
muchacho
  • 55
  • 1
  • 6
3
votes
1 answer

How can I make my scraper website-design-change-tolerant?

I have written a web scraper in ruby . But the websites that I am scraping hav changed their design.Thus my scraper is failing. Is there a smart and simple solution to solve this kind of an inherent problem of scrapers? (for eg.. using some kind of…
HPC_wizard
  • 179
  • 3
  • 11
3
votes
3 answers

Why can I not scrape the title off this site?

I'm using simple-html-dom to scrape the title off of a specified site. find('title') as $element) echo $element->innertext .…
Alex
  • 103
  • 2
  • 8
2
votes
0 answers

OGP endpoints that point to Facebook entities being incorrectly parsed by FB crawler?

Our app renders Like buttons that point back to an actual Facebook page. However, instead of pointing the Like button's href directly to the FB url, we proxy it through our servers through an opengraph endpoint. This is helpful because it allows us…
diurnalist
  • 408
  • 3
  • 9
2
votes
3 answers

Ruby Mechanize web scraper library returns file instead of page

I have recently been using the Mechanize gem in ruby to write a scraper. Unfortunately, the URL that I am attempting to scrape returns a Mechanize::File object instead of a Mechanize::Page object upon a GET request. I can't figure out why. Every…
JRPete
  • 3,074
  • 3
  • 19
  • 17
2
votes
0 answers

How to get table row of a website that updates dynamically (simple html DOM parser)?

Basically what I want to do is get a particular table row of a website. the table has an id of "table-data". I have already written the PHP but I noticed that the file_get_html doesn't actually get the data that is dynamically loaded. How should I…
HessamSH
  • 357
  • 1
  • 5
  • 18
2
votes
1 answer

Trying to Scrape Reddit with praw.Reddit

Im trying to scrape Reddit with the praw.reddit command and I keep getting the following: prawcore.exceptions.OAuthException: unauthorized_client error processing request (Only script apps may use password auth) Heres the top of my code:(I removed…
bullybear17
  • 859
  • 2
  • 13
  • 31