Scraping and External Resources

Question

I've just started to learn about scraping and I just had a quick question.

Scraping images and files through the DOM is no problem but I was curious if it was possible to scrape external resources linked to a document such as web fonts(sorry couldn't think of another example off the top of my head). Things like this are used within the page but not linked through typical means.

If anyone could tell me if such things are possible? I only know Ruby and a bit of JS. Also if you can give me other examples of resources like web fonts that aren't linked normally that would be cool to.

Thanks.

I haven't a clue where to start :S, how do you even gain access to something like base64 string that you can only view in the inspector? Like I said I'm really new to this and would appreciate any help even if it's pointing me somewhere else to find my own answer. — BustOutTheRobot, Feb 20 '15 at 08:53
you need to use a complete scraping tool. Read here http://stackoverflow.com/questions/15037392/web-page-scraping-gems-tools-available-in-ruby — aberna, Feb 20 '15 at 09:11
If I understand your question correctly you want to access data/files that is referenced on the page, correct? Since, as just implied, there is a working reference to what you want in the DOM I can't see what the problem is? Maybe an example would make it more clear what you want to achieve? — Severin, Feb 21 '15 at 08:12
What I mean is if you look in the web inspector on your browser and under the resources pane you'll see a list: Fonts, Frames, Images etc. Now some of those files aren't directly linked in the HTML document, how do I gain access to these? If you look at http://facebook.github.io/react/ for example. — BustOutTheRobot, Feb 21 '15 at 13:55
Sorry also images on sites like Instagram and others where the image isn't delivered in the normal manner. — BustOutTheRobot, Feb 22 '15 at 12:05

Scraping and External Resources

0 Answers0