I am trying to get the "NAME" and "EMAIL" texts from the following html file:
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<ol>
<li>
<font class="normal">
<b>NAME</b> <a href="/member/mail_compose.aspx?id=name"><img src="/images/mailbox.gif" border="0" alt="Send Mail" /></a> <a href="/photos/member_viewphoto.aspx?id=name"><img src="/images/icons/member_photos.gif" border="0" alt="View Photos" /></a> <br />
ADDRESS<br />
PHONE<br />
<a href="mailto:email@hotmail.com" class="redlink">EMAIL</a><br />
<br />
</font>
</li>
</body>
</html>
Here is the code that I am using:
// Load the xml document
XDocument xDoc = XDocument.Load(@"..\..\Directory.html");
// Parse document
var names = xDoc.Root.DescendantsAndSelf()
.Where(x => x.Name.LocalName == "ol").DescendantsAndSelf()
.Where(x => x.Name.LocalName == "li").DescendantsAndSelf()
.Select(x => new
{
name = x.Elements().Where(y => y.Name.LocalName == "b").Select(y => y.Value),
email = x.DescendantsAndSelf().Where(y => y.Name.LocalName == "a" && x.FirstAttribute.Name == "href" && x.Attribute("href").Value.Contains("mailto")).Select(y => y.Value ?? "No Email")
}
);
// Print text to console
for (int i = 0; i < names.Count(); i++)
{
Console.WriteLine("{0}: {1}", names.ElementAt(i).name, names.ElementAt(i).email);
}
Somehow, the above code is printing this:
System.Linq.Enumerable+WhereSelectEnumerableIterator
2[System.Xml.Linq.XElement, System.String]: System.Linq.Enumerable+WhereSelectEnumerableIterator
2[System.Xm l.Linq.XElement,System.String]
Could someone please tell me why this is happening? Also, if there is a better way of doing this, suggestions would be very welcome.