0

I am scraping a news website and they also provide a link to a whole article, however href for those links look like this:

/news-features/8/news-headlines/103818/these-pupils-deserve-better

So in order for the link I need to dynamically add:

http://www.oldham-chronicle.co.uk

So the whole link would be:

http://www.oldham-chronicle.co.uk/news-features/8/news-headlines/103818/these-pupils-deserve-better

As you can assume there is more then 1 article however the part of the link I need to add is the same. Therefore for each one of them I need to add it.

At the moment I have:

$("a").each(function(){
    this.href=this.href.replace("http://www.oldham-chronicle.co.uk");
});

however my link looks like this:

href="http://localhost/news-features/8/news-headlines/103818/these-pupils-deserve-better"

Which is wrong, how can that be solved?

Przemek
  • 834
  • 6
  • 21
  • 49
  • 1
    that isn't wrong, really.. your links render as absolute links, in this case localhost is your well.. local web server for your site. Your function doesn't actually replace anything.. what you want to do is add a second parameter - currently you're just replacing oldham-chroncile with an empty string.. – treyBake May 31 '17 at 15:00
  • maybe an example please? my brain is a bit slower after 8 hours of work haha – Przemek May 31 '17 at 15:01
  • 1
    and I'm presuming the oldham chronicle has given you permission to screen scrape them...?! – Liam May 31 '17 at 15:01
  • yes we are a business and we are allowed to do that ;) don't worry it has been all sorted before starting doing it – Przemek May 31 '17 at 15:02

5 Answers5

2

Try this instead:

var base = "http://www.oldham-chronicle.co.uk/";
$('a').each(function(index, element) {
  element.href = element.href.replace("http://localhost/", base);
})

Basically, this loops over each a element and prepends the hard coded URL that you desire. (you can also do this without jquery if desired)

Edit: Misunderstood the original question, updated from the comment to replace the url at the beginning (with a simplistic matcher)

AnilRedshift
  • 7,937
  • 7
  • 35
  • 59
  • Nearly there, I need to remove that localhost from the link http://www.oldham-chronicle.co.ukhttp://localhost/news-features/8/news-headlines/103818/these-pupils-deserve-better – Przemek May 31 '17 at 15:04
0

This will work for you

$("a").each(function(){
     if ($(this).attr('href') != null) {
        var newUrl = $(this).attr('href').replace("http://localhost","http://www.oldham-chronicle.co.uk");
        if (newUrl.indexOf("http:")==-1) {newUrl  = "http://www.oldham-chronicle.co.uk"+newUrl; }
        $(this).attr('href', 
         );
     }
});
Victor Leontyev
  • 8,488
  • 2
  • 16
  • 36
0

For readers using modern browsers, you can just use the hostname property:

$('a').each((i, e) => e.hostname = 'www.oldham-chronicle.co.uk');

Note that this will not work in Internet Explorer, and some other browsers, as indicated on the linked page. This example also uses arrow functions, which are not supported in some older browsers.

Heretic Monkey
  • 11,687
  • 7
  • 53
  • 122
0

You can avoid using .each by doing something like this:

$('a.link').attr('href', function(i, v){
  console.log("replaced href:");
  console.log(v);
  return v.replace('http://localhost', function() {
      return 'www.oldham-chronicle.co.uk';
  })
});

// just to show new href
$('a.link').attr('href', function(i, v) {
    console.log("new href:");
 console.log(v);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<a class="link" href="http://localhost/news-features/8/news-headlines/103818/these-pupils-deserve-better">Link1</a>
<a class="link" href="http://localhost/news-features/7/news-headlines/103818/these-pupils-deserve-better">Link2</a>

Thanks to Denys answer here.

Anthony
  • 1,439
  • 1
  • 19
  • 36
-4

Try this:

$("a").each(function(){
 $(this).attr("href", "http://localhost/news-features/8/news-headlines/103818/these-pupils-deserve-better");
});
  • no, this will set all links to be that link - he wants to replace localhost with the domain - not just take every link to that page – treyBake May 31 '17 at 15:00
  • this is wrong as I will be setting a same link to all the articles, and each article has different link – Przemek May 31 '17 at 15:01
  • 1) there's a typo (double double quote) and 2) you're totally missing the second part of the URL – Jeremy Thille May 31 '17 at 15:02