1

Assuming I have a WebCrawler class. There are several errors it can encounter. How should I propagate the errors upward?

Using exceptions:

class WebCrawler
  class UrlBadFormatError < StandardError; end
  class PageNotFoundError < StandardError; end
  class UnauthorizedError < StandardError; end
  def crawl(url)
    if(! url =~ /some_format/)
      raise UrlBadFormatError
    response = get(url)
    if(response.code == 404)
      raise PageNotFoundError
    if(response.code == 403)
      raise UnauthorizedError
    ... 
  end
end

or constants:

class WebCrawler
  URL_BAD_FORMAT = 1
  PAGE_NOT_FOUND = 2
  UNAUTHORZIED = 3 
  def crawl(url)
    if(! url =~ /some_format/)
      return URL_BAD_FORMAT
    response = get(url)
    if(response.code == 404)
      return PAGE_NOT_FOUND
    if(response.code == 403)
      return UNAUTHORZIED
    ... 
  end
end

or symbols:

class WebCrawler
  def crawl(url)
    if(! url =~ /some_format/)
      return :url_bad_format
    response = get(url)
    if(response.code == 404)
      return :page_not_found
    if(response.code == 403)
      return :unauthorized
    ... 
  end
end

which is best? or it depends(on what?)

Lai Yu-Hsuan
  • 27,509
  • 28
  • 97
  • 164

5 Answers5

2

For something which indicates programmer error, such as the wrong type of argument passed to a method, definitely throw an exception. The exception will crash the program, drawing the programmer's attention to the fact that they are using your class incorrectly, so they can fix the problem. In this case, returning an error code wouldn't make sense, because the program will have to include code to check the return value, but after the program is debugged, such errors shouldn't ever happen.

In your WebCrawler class, is it expected that crawl will receive a bad URL as an argument sometimes? I think the answer is probably no. So raising an exception would be appropriate when a bad URL is passed.

When an exception is raised, the flow of execution suddenly "jumps" to the innermost handler. This can be a useful way to structure code when the exception is not expected to happen most of the time, because you can write the "main flow" of your method as simple, straight-line code without including a lot of details about what will happen when some rare error condition occurs. Those details can be separated from the "main flow" code, and put in an exception handler. When an error condition is expected to happen under normal conditions, though, it can be better to put the error handling code inline with the "main flow", to make it clearer what is going on. If the control flow of your program "jumps around" (as is the case when exceptions are used for normal flow control), that means the reader also has to "jump around" in the program text as they are figuring out how it works.

For the other two, I think it is expected that at least sometimes, the HTTP request will return an error code. To determine whether an exception or special return value is the best way to indicate such a condition, I would think about how often those conditions are going to happen under normal usage. Think also about how the client code will read either way. If you use exceptions, they will have to write something like:

urls.map do |url|
  begin
    crawl(url)
  rescue PageNotFoundError
    ""
  rescue UnauthorizedError
    ""
  end
end

(By the way, I think this code example shows something: it might be a good idea if both of your custom exceptions inherit from a common superclass, so you can catch both of them with a single rescue clause if desired.) Or if you use error codes, it would look something like:

urls.map do |url|
  response = crawl(url)
  if [:page_not_found, :unauthorized].include? response
    ""
  else
    response
  end
end

Which do you think reads better? It's really up to you. The one thing which you do not want to do is use integer constants for errors. Why use integers? When you print them in a debug trace, you'll have to go look at the list of constants to see what each one means. And using symbols is just as efficient computationally.

Alex D
  • 29,755
  • 7
  • 80
  • 126
2

Why wouldn't you throw exceptions? They can encapsulate additional information besides just the type, are trivially rescued, and if you're using an IDE, are first-class citizens.

Dave Newton
  • 158,873
  • 26
  • 254
  • 302
  • 5
    I don't think whether you use an IDE or not should ever be considered in *how* or *what* you program. – Andrew Marshall Mar 04 '12 at 20:47
  • 2
    @AndrewMarshall That's one point of view, but it can actually matter. – Dave Newton Mar 04 '12 at 20:52
  • 3
    Other than possibly stylistic/documentation conventions (which don't actually affect the literal code), no, it should not. Exceptions being first-class citizens (I assume by this you mean classes) is true regardless of whether you use an IDE—using an IDE doesn't somehow alter that. – Andrew Marshall Mar 04 '12 at 20:59
  • 2
    @AndrewMarshall Didn't say it did. It does, however, alter what the development environment can do for you *around* the symbols/classes/etc. you're using. Feel free to focus on this one aspect of what I said, despite it being pretty ancillary, but you'll have to do so without me. Carry on! – Dave Newton Mar 04 '12 at 21:04
  • 3
    You're using an IDE as part of your justification for using exceptions over other options, so I don't know how you're *not* saying that. – Andrew Marshall Mar 04 '12 at 21:06
  • 2
    @AndrewMarshall I know you don't, and I'm okay with that. You still seem to think I believe an IDE confers first-class citizenship to classes, even though I explicitly said otherwise, so this is rather pointless. We both have better stuff to do. – Dave Newton Mar 04 '12 at 21:09
0

If it's an exception then by all means raises an exception! All three of those cases are, in my opinion, exceptions. While some may argue that 4xx status codes aren't exception-worthy since you may expect them to happen, they are still client errors.

You may also read about Ruby's throw/catch, which offer exception-like behavior for cases where "don't use exceptions for control flow" applies (though I don't think that's the case here).

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
0

You should raise errors. If you encounter a malformed URL, or if the page isn't found, or if you weren't authorized to access the page, it means you cannot continue crawling. Raising an error or exception returns from the method and lets the caller deal with the unusual situation.

It should also include information about the error, such as error codes, the URL which resulted in an error and any other relevant information. It can help in deciding how best to handle the error and can later be formatted into a helpful message for the user.

What you should not do, ever, is return numeric error codes. Ruby is not C. Just use symbols instead.

Matheus Moreira
  • 17,106
  • 3
  • 68
  • 107
-1

I am against the use of exceptions upon encountering 403s, 404s, malformed urls and similar common occurences on the web. Exceptions are meant for "internal" errors. In the World Wild Web, bad URLs are entirely unexceptional. There should be a method(s) for handling each different URL disease. I would personally return special values as symbols, or some "SpecialCase" objects recording what happened. There is also underused catch...throw statement.

Boris Stitnicky
  • 12,444
  • 5
  • 57
  • 74
  • 1
    Just because an error happens a lot, doesn't mean it's not an exception. If a car doesn't start because it's battery is flat, is that normal behaviour, yes, but it's an error condition that requires a special case to remedy. Don't get your bad habits confused with best practice. Comms failures ARE exceptions, no matter how often they occur. Fail != Success. – ocodo Jun 06 '13 at 03:07
  • My reasons are not obvious from my answer, so perhaps it is not so great after all :-) What's worse, I forgot them myself :-) But my habits are not to blame, as I love exceptions -- not only when the battery is flat, but also when the street light is red. There is [one discussion of this on SO](http://stackoverflow.com/questions/2018137), that explicitly mentions Sinatra using catch/throw for HTTP exceptions, and [I started another one](http://stackoverflow.com/questions/16972757). Is catch/throw really bad, or just underused? – Boris Stitnicky Jun 06 '13 at 21:50