0

Sometimes the variables doc or type are null. So I tried to add first if (type == null){....} else {....}

But if it's null what should I return back ? Now I tried to use try and catch but since it's null then I get null exception in another class where im using this class.

public static HtmlAgilityPack.HtmlDocument getHtmlDocumentWebClient(string url, bool useProxy, string proxyIp, int proxyPort, string usename, string password)
{
    HtmlAgilityPack.HtmlDocument doc = null;
    using (MyClient clients = new MyClient())
    {
        clients.HeadOnly = true;
        byte[] body = clients.DownloadData(url);
        // note should be 0-length
        string type = clients.ResponseHeaders["content-type"];
        clients.HeadOnly = false;
        // check 'tis not binary... we'll use text/, but could
        // check for text/html
        try
        {
            if (type.StartsWith(@"text/html"))
            {
                string text = clients.DownloadString(url);
                doc = new HtmlAgilityPack.HtmlDocument();
                WebClient client = new WebClient();
                //client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
                client.Credentials = CredentialCache.DefaultCredentials;
                client.Proxy = WebRequest.DefaultWebProxy;
                if (useProxy)
                {
                    //Proxy                
                    if (!string.IsNullOrEmpty(proxyIp))
                    {
                        WebProxy p = new WebProxy(proxyIp, proxyPort);
                        if (!string.IsNullOrEmpty(usename))
                        {
                            if (password == null)
                                password = string.Empty;
                            NetworkCredential nc = new NetworkCredential(usename, password);
                            p.Credentials = nc;
                        }
                    }
                }
                doc.Load(client.OpenRead(url));
            }
        }
        catch
        {
        }
    }
    if (doc == null)
    {
        //MessageBox.Show("Doc is null   " + doc + " The link that did it was    " + url);
    }
    return doc;
}

The function get url's each time and on some specific url the variable type is null. The reason that the site need password or something.

How should I handle the null ?

Nasreddine
  • 36,610
  • 17
  • 75
  • 94
user2065612
  • 461
  • 1
  • 7
  • 20

2 Answers2

2

If type is null, apparently there is no Content-Type header in the response.

string type = clients.ResponseHeaders["content-type"];

Then doc will also be null since the line type.StartsWith will throw a NullReferenceException that is swallowed by your general catch-clause (a very Bad Thing™).

If type is not null but doc is null, apparently the content-type doesn't start with text/html:

 if (type.StartsWith(@"text/html"))
     doc = new HtmlAgilityPack.HtmlDocument();

Since your function is named getHtmlDocumentWebClient, I assume it is used to get a HTML document. When there is no such document (because you couldn't determine the content type, or the content-type was something other than text/html), then yes, your method should return null (or throw an exception). You only throw an exception when it is unexpected, but with web development it is not really unexpected when you get something other than a HTML document.

Then you handle the possibility of getting a null value whenever getHtmlDocumentWebClient is called. It depends on your situation what you do when there is no HTML document.

Note that the Content-Type, if present, may lie. For example, it may return application/octet-stream for almost anything.

Daniel A.A. Pelsmaeker
  • 47,471
  • 20
  • 111
  • 157
  • You're absolutely correct. 1) "content-type" may be omitted - or might be misleading. 2) If "content-type" happens to be missing ... then variable "type" (a *very* poor name for a variable, BTW...) will be null ... and "type.StartsWith()" will throw an exception ... *BEFORE* "doc = new HtmlAgilityPack.HtmlDocument()" gets executed. Hence the "doc" null. – paulsm4 Mar 28 '13 at 23:05
  • Could you show me please a complete solution for that how the class should look like how to handle the null ? The problem is that im calling this function from another class from inside a recrusive function so I need to handle the null maybe in the other places too. Or to return something else then null . – user2065612 Mar 29 '13 at 00:06
  • @user2065612 You should think of `null` as meaning _not present/unknown_. Whenever you encounter something that may possibly be `null` you'd have to ask yourself: is it unexpected and possibly a bug if this were `null`? Then throw an exception. Otherwise, propagate the `null` by letting your method return `null` (HTML document is _not present/unknown_) as well. Since your method may return `null`, you have to ask yourself the same question again. Rinse and repeat. I can't give you a complete code solution. – Daniel A.A. Pelsmaeker Mar 29 '13 at 00:12
0

If the result is null then your caller should handle it. If you could reasonably expect a null return argument then it's up to the caller to do a null test or you might consider throwing an exception if a null result is an error condition. In any case the caller should catch any potential exception and handle gracefully.

Charleh
  • 13,749
  • 3
  • 37
  • 57