14

I've got some trouble with the HTML Agility Pack.

I get a null reference exception when I use this method on HTML not containing the specific node. It worked at first, but then it stopped working. This is only a snippet and there are about 10 more foreach loops that selects different nodes.

What am I doing wrong?

public string Export(string html)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(html);
    // exception gets thrown on below line
    foreach (var repeater in doc.DocumentNode.SelectNodes("//table[@class='mceRepeater']"))
    {
        if (repeater != null)
        {
            repeater.Name = "editor:repeater";
            repeater.Attributes.RemoveAll();
        }
    }

    var sw = new StringWriter();
    doc.Save(sw);
    sw.Flush();

    return sw.ToString();
}
Oleks
  • 31,955
  • 11
  • 77
  • 132
tohereknowswhen
  • 199
  • 2
  • 3
  • 9

5 Answers5

31

AFAIK, DocumentNode.SelectNodes could return null if no nodes found.

This is default behaviour, see a discussion thread on codeplex: Why DocumentNode.SelectNodes returns null

So the workaround could be in rewriting the foreach block:

var repeaters = doc.DocumentNode.SelectNodes("//table[@class='mceRepeater']");
if (repeaters != null)
{
    foreach (var repeater in repeaters)
    {
        if (repeater != null)
        {
            repeater.Name = "editor:repeater";
            repeater.Attributes.RemoveAll();
        }
    }
}
Oleks
  • 31,955
  • 11
  • 77
  • 132
12

This has been updated, and you can now prevent SelectNodes from returning null by setting doc.OptionEmptyCollection = true, as detailed in this github issue.

This will make it return an empty collection instead of null if there are no nodes which match the query (I'm not sure why this wasn't the default behaviour to begin with, though)

Harry
  • 187
  • 2
  • 10
  • Didn't work for me: System.ArgumentOutOfRangeException HResult=0x80131502 Message=Index was out of range. Must be non-negative and less than the size of the collection. – PastExpiry.com Mar 28 '21 at 15:05
  • That doesn't sound like the kind of error you'd get from this function. Are you sure it was `doc.DocumentNode.SelectNodes` which was throwing the error? What query are you passing in to `SelectNodes`? – Harry Mar 29 '21 at 16:35
  • @PastExpiry.com maybe you're trying to do something like `doc.DocumentNode.SelectNodes(selector)[0]` to get the first node in the list, but the returned list is empty and so that node doesn't exist? – Harry Mar 29 '21 at 16:40
  • Yes... temp = doc.DocumentNode.SelectNodes("//*[@id='cr_cashflow']/div[2]/div[2]/table/thead/tr/th")[0].InnerText; – PastExpiry.com Mar 30 '21 at 21:21
  • So it's not the `SelectNodes` that's throwing this error, it's the `[0]`. `SelectNodes` is correctly returning an empty list instead of `null`. You're trying to access the first element of that empty list, which doesn't exist, and so it throws an exception – Harry May 05 '21 at 12:10
3

As per Alex's answer, but I solved it like this:

public static class HtmlAgilityPackExtensions
{
    public static HtmlAgilityPack.HtmlNodeCollection SafeSelectNodes(this HtmlAgilityPack.HtmlNode node, string selector)
    {
        return (node.SelectNodes(selector) ?? new HtmlAgilityPack.HtmlNodeCollection(node));
    }
}
Alex from Jitbit
  • 53,710
  • 19
  • 160
  • 149
quillbreaker
  • 6,119
  • 3
  • 29
  • 47
2

You add simple ? before every . example are given blow:

var titleTag = htdoc?.DocumentNode?.Descendants("title")?.FirstOrDefault()?.InnerText;
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
1

I've created universal extension which would work with any IEnumerable<T>

public static List<TSource> ToListOrEmpty<TSource>(this IEnumerable<TSource> source)
{
    return source == null ? new List<TSource>() : source.ToList();
}

And usage is:

var opnodes = bodyNode.Descendants("o:p").ToListOrEmpty();
opnodes.ForEach(x => x.Remove());
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
s_tranquil
  • 276
  • 3
  • 6
  • 2
    I like the idea behind this solution, but use Enumerable.Empty instead of ToList. That way you will not iterate converting to a List. – brianfeucht Mar 21 '16 at 23:16