7

I'm using Cheerio (https://github.com/MatthewMueller/cheerio) to scrape websites and get images for a project I'm working on. I'm wondering if there's an easy way with Node.js (or another package) to convert the $(img).attr('src') to a fully qualified URL? Sometimes I'll get "image.jpg" and other times "../../image.jpg", and other times "//somepath/image.jpg". Perhaps I'm just missing a regex of some sort... Thanks for your time :)

ewindsor
  • 885
  • 10
  • 24
  • 1
    We will need the url of the scrapped site... Or an example of a site like that. Either way, I recommend you to build yourself an extra function to parse these values. – Herman Junge Oct 26 '12 at 03:42
  • Ohh Brilliant !! I was troubled by the exact same thing, was manually writing out solutions for each of these. God bless SO ! – vishalv2050 May 31 '14 at 15:28

1 Answers1

10

Look at the node url module. Specifically url.resolve(from, to) should be what you're looking for.

Waylon Flinn
  • 19,969
  • 15
  • 70
  • 72