As the question states; is there some way to detect all URLs inside a PHP page if they're relative. And by considering of course that the URLs contained in the PHP Page may appear in different behaviors :
<link rel="stylesheet" href="/lib/css/hanv2/ie.css" />
<img src="/image.jpg">
<div style="background-image: url(/lib/data/emotion-header-v2/int-algemeen08.jpg)"></div>
So i need to get the relative URL no matter what's its bihavior css link
, js link
, image link
, swf link
I'm using AgilityPack for this, and here is some C# code snippest that i used to detect links and check whether they're relative :
// to extract all a href tags
private List<string> ExtractAllAHrefTags(HtmlAgilityPack.HtmlDocument htmlSnippet)
{
List<string> hrefTags = new List<string>();
foreach (HtmlNode link in htmlSnippet.DocumentNode.SelectNodes("//link[@href]"))
{
HtmlAttribute att = link.Attributes["href"];
hrefTags.Add(att.Value);
}
return hrefTags;
}
// to extract all img src tags
private List<string> ExtractAllImgTags(HtmlAgilityPack.HtmlDocument htmlSnippet)
{
List<string> hrefTags = new List<string>();
foreach (HtmlNode link in htmlSnippet.DocumentNode.SelectNodes("//img[@src]"))
{
HtmlAttribute att = link.Attributes["src"];
hrefTags.Add(att.Value);
}
return hrefTags;
}
//to check whether path is relative
foreach (string s in AllHrefTags)
{
if (!s.StartsWith("http://") || !s.StartsWith("https://"))
{
// path is not relative
}
}
I'm wondering if there is a good or a more accurate way to get all relative paths from a given HTML page using AgilityPack or something else in a short way