1

I've just started to learn about scraping and I just had a quick question.

Scraping images and files through the DOM is no problem but I was curious if it was possible to scrape external resources linked to a document such as web fonts(sorry couldn't think of another example off the top of my head). Things like this are used within the page but not linked through typical means.

If anyone could tell me if such things are possible? I only know Ruby and a bit of JS. Also if you can give me other examples of resources like web fonts that aren't linked normally that would be cool to.

Thanks.

  • have you tried anything so far ? – aberna Feb 20 '15 at 07:50
  • I haven't a clue where to start :S, how do you even gain access to something like base64 string that you can only view in the inspector? Like I said I'm really new to this and would appreciate any help even if it's pointing me somewhere else to find my own answer. – BustOutTheRobot Feb 20 '15 at 08:53
  • you need to use a complete scraping tool. Read here http://stackoverflow.com/questions/15037392/web-page-scraping-gems-tools-available-in-ruby – aberna Feb 20 '15 at 09:11
  • If I understand your question correctly you want to access data/files that is referenced on the page, correct? Since, as just implied, there is a working reference to what you want in the DOM I can't see what the problem is? Maybe an example would make it more clear what you want to achieve? – Severin Feb 21 '15 at 08:12
  • What I mean is if you look in the web inspector on your browser and under the resources pane you'll see a list: Fonts, Frames, Images etc. Now some of those files aren't directly linked in the HTML document, how do I gain access to these? If you look at http://facebook.github.io/react/ for example. – BustOutTheRobot Feb 21 '15 at 13:55
  • Sorry also images on sites like Instagram and others where the image isn't delivered in the normal manner. – BustOutTheRobot Feb 22 '15 at 12:05

0 Answers0