4

I'm having a hard time choosing whether I should "enforce" a condition or "assert" a condition in D. (This is language-neutral, though.)

Theoretically, I know that you use assertions to find bugs, and you enforce other conditions in order to check for atypical conditions. E.g. you might say assert(count >= 0) for an argument to your method, because that indicates that there's a bug with the caller, and that you would say enforce(isNetworkConnected), because that's not a bug, it's just something that you're assuming that could very well not be true in a legitimate situation beyond your control.

Furthermore, assertions can be removed from code as an optimization, with no side effects, but enforcements cannot be removed because they must always execute their condition code. Hence if I'm implementing a lazy-filled container that fills itself on the first access to any of its methods, I say enforce(!empty()) instead of assert(!empty()), because the check for empty() must always occur, since it lazily executes code inside.

So I think I know that they're supposed to mean. But theory is easier than practice, and I'm having a hard time actually applying the concepts.

Consider the following:

I'm making a range (similar to an iterator) that iterates over two other ranges, and adds the results. (For functional programmers: I'm aware that I can use map!("a + b") instead, but I'm ignoring that for now, since it doesn't illustrate the question.) So I have code that looks like this in pseudocode:

void add(Range range1, Range range2)
{
    Range result;
    while (!range1.empty)
    {
        assert(!range2.empty);   //Should this be an assertion or enforcement?
        result += range1.front + range2.front;
        range1.popFront();
        range2.popFront();
    }
}

Should that be an assertion or an enforcement? (Is it the caller's fault that the ranges don't empty at the same time? It might not have control of where the range came from -- it could've come from a user -- but then again, it still looks like a bug, doesn't it?)

Or here's another pseudocode example:

uint getFileSize(string path)
{
    HANDLE hFile = CreateFile(path, ...);
    assert(hFile != INVALID_HANDLE_VALUE); //Assertion or enforcement?
    return GetFileSize(hFile); //and close the handle, obviously
}
...

Should this be an assertion or an enforcement? The path might come from a user -- so it might not be a bug -- but it's still a precondition of this method that the path should be valid. Do I assert or enforce?

Thanks!

user541686
  • 205,094
  • 128
  • 528
  • 886

3 Answers3

1

I'm not sure it is entirely language-neutral. No language that I use has enforce(), and if I encountered one that did then I would want to use assert and enforce in the ways they were intended, which might be idiomatic to that language.

For instance assert in C or C++ stops the program when it fails, it doesn't throw an exception, so its usage may not be the same as what you're talking about. You don't use assert in C++ unless you think that either the caller has already made an error so grave that they can't be relied on to clean up (e.g. passing in a negative count), or else some other code elsewhere has made an error so grave that the program should be considered to be in an undefined state (e.g. your data structure appears corrupt). C++ does distinguish between runtime errors and logic errors, though, which may roughly correspond but I think are mostly about avoidable vs. unavoidable errors.

In the case of add you'd use a logic error if the author's intent is that a program which provides mismatched lists has bugs and needs fixing, or a runtime exception if it's just one of those things that might happen. For instance if your function were to handle arbitrary generators, that don't necessarily have a means of reporting their length short of destructively evaluating the whole sequence, you'd be more likely consider it an unavoidable error condition.

Calling it a logic error implies that it's the caller's responsibility to check the length before calling add, if they can't ensure it by the exercise of pure reason. So they would not be passing in a list from a user without explicitly checking the length first, and in all honesty should count themselves lucky they even got an exception rather than undefined behavior.

Calling it a runtime error expresses that it's "reasonable" (if abnormal) to pass in lists of different lengths, with the exception indicating that it happened on this occasion. Hence I think an enforcement rather than an assertion.

In the case of filesize: for the existence of a file, you should if possible treat that as a potentially recoverable failure (enforcement), not a bug (assertion). The reason is simply that there is no way for the caller to be certain that a file exists - there's always someone with more privileges who can come along and remove it, or unmount the entire fielsystem, in between a check for existence and a call to filesize. It's therefore not necessarily a logical flaw in the calling code when it doesn't exist (although the end-user might have shot themselves in the foot). Because of that fact it's likely there will be callers who can treat it as just one of those things that happens, an unavoidable error condition. Creating a file handle could also fail for out-of-memory, which is another unavoidable error on most systems, although not necessarily a recoverable one if for example over-committing is enabled.

Another example to consider is operator[] vs. at() for C++'s vector. at() throws out_of_range, a logic error, not because it's inconceivable that a caller might want to recover, or because you have to be some kind of numbskull to make the mistake of accessing an array out of range using at(), but because the error is entirely avoidable if the caller wants it to be - you can always check the size() before access if you have no other way of knowing whether your index is good or not. And so operator[] doesn't guarantee any checks at all, and in the name of efficiency an out of range access has undefined behavior.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • I meant language-neutral for the concept and not necessarily for the particular functions, but yeah, I guess it can differ by language a bit. What I don't really understand is this, though: all ranges passed to `add` must have the same size, in order for the call to be correct. When I'm making a *library* (as opposed to an app), I have *no idea* if this indicates a bug or not for the caller -- all I know is how to perform the task, and that I can't do it correctly if I get an invalid input. So should I `assert` or throw a runtime error? It may or may not be the caller's fault, I don't know. – user541686 Feb 27 '11 at 01:01
  • @Mehrdad: you must choose one, and your choice should be inspired by the intended purpose of the function if you have an intended purpose. If you don't have an intended purpose, you'll probably pick one arbitrarily, and that will in effect *create* the intended purpose, or at least will guide users towards a particular kind of use. If the only practical difference is what exact class of exception is thrown, then of course it's only a very weak kind of guidance, easily worked around. If it's the difference between a catchable exception and an uncatchable shutdown, the choice is more important. – Steve Jessop Feb 27 '11 at 01:47
  • Btw, if you're doing API design and you aren't considering how and why users will call your functions, and how they *should* call your functions, then you're probably missing opportunities to improve your APIs in all sorts of ways, not just this one. Probably you *are* considering such things in general, just not in this case, so I don't think the fact that you're writing a library is an insurmountable obstacle here. – Steve Jessop Feb 27 '11 at 01:49
  • Well, the intended purpose is just the same as that of `map` in functional languages: to refactor common code, so you can focus less on the mechanics and more on the meanings. Other than that, the semantics rely on the caller, so that's why I'm at a loss. But I think your explanation's good nonetheless; thanks. :) – user541686 Feb 27 '11 at 01:53
  • @Mehrdad: but refactor *what* common code? If you intend it to replace common code that checks the lengths of the lists and then zips with addition, then enforce. If you intend it only to replace code that doesn't check because it knows, assert (or perhaps even don't fail - silently discard the excess length on one side, or pad the short size with zeros). If you literally don't have any opinion, do neither and take whatever behavior the language generates naturally from your code (at least until you come to document said behavior and realise that it's complex and has unintuitive edge-cases...) – Steve Jessop Feb 27 '11 at 02:00
  • @Steve: I meant refactoring the code for iterating through the ranges and calling another function for every element... I like your last idea about deciding at the time of documentation. :) – user541686 Feb 27 '11 at 02:05
  • @Mehrdad: certainly as a rough rule of thumb, if something's hard to document there's a real danger that it's also hard to understand the documentation, and/or hard to use correctly. It's not *just* an excuse for slacking off on the boring parts :-) – Steve Jessop Feb 27 '11 at 02:11
  • @Steve: Haha yeah okay, thanks. :) So far, I've tended to write code that, instead of throwing exceptions, just makes sure that it somehow crashes on a poor input (e.g. I make sure that a null parameter segfaults instead of bothering to explicitly throw an exception for a null argument), so so far I've managed to slack off on the boring parts. But thanks for the warning. :] – user541686 Feb 27 '11 at 02:27
  • @Mehrdad: I'd be happy with "null parameter segfaults" as long as you've documented that a null parameter is not a valid input to your function, either explicitly or implicitly because you've said, "the parameter is a pointer to an object with such-and-such properties". Some people would argue for more defensive API design, though, and that since null is an easy pointer to pass to a function, you *should* have defined behavior for it, and preferably sensible behavior. The C standard library disagrees, but then that's a pretty low-level API and definitely old-school. – Steve Jessop Feb 27 '11 at 02:51
  • @Steve: Yeah, I definitely try to give a meaning to null if it's at all possible -- I usually let a nullable parameter mean that it's optional, unless that makes no sense, in which case I just make my code segfault. Either way, I don't let a null parameter result in undefined behavior, so it either gives a default result or crashes. :) – user541686 Feb 27 '11 at 03:38
  • @Mehrdad: ah right, dereferencing a null pointer is *guaranteed* to segfault in D, is it? In C and C++ it's undefined behavior. – Steve Jessop Feb 27 '11 at 03:40
  • @Steve: If you mean a pointer like `int*`, I don't know, and I assume it's undefined. But that's not what I meant by a pointer -- I really meant reference types like `Object`, which are nullable by default. I *think* these are guaranteed to segfault, but honestly, I don't really care -- it segfaults on any "typical" system, and given that the segfault behavior is only meant for debugging (the argument should isn't null in correct code anyway), I use it without caring whether dereferencing it is technically defined or not, since on the systems I debug my program in, the result is predictable. – user541686 Feb 27 '11 at 03:49
  • @Steve: or, in other words, I only care about portability if it has practical consequences. In the case of a segfault on a null pointer, I've found that it's safe to assume that it always happens in user-mode code, and that it's unnecessary to worry about it when you're only using it as a debugging aid. – user541686 Feb 27 '11 at 03:53
  • @Mehrdad: ah, OK, there are some really subtle things that can go wrong with dereferencing null pointers in C. For example there's a GCC optimization where if the compiler detects that you dereference a pointer, then it assumes it isn't null. It could for example replace the code `if (p != 0) return true; *p; return false;` with `return true;`. It then *won't* crash even if `p` is null. This led to a linux bug where there was some kernel code playing with memory maps that relied on getting (and handling) a segfault, but gcc removed the code that would have caused that segfault... – Steve Jessop Feb 27 '11 at 03:57
  • @Steve: But nobody optimizes debug mode code, right? ;) (And by the way, the story about kernel-mode code is different -- I've only mainly programmed in user mode, and what I said doesn't really apply to what I'd do in kernel mode. So the worst-case scenario is that the app crashes, and again, as a debugging aid, that's enough. Btw, this follows the same philosophy as the "best-effort" basis mentioned in Java's [ConcurrentModificationException](http://download.oracle.com/javase/1.5.0/docs/api/java/util/ConcurrentModificationException.html).) – user541686 Feb 27 '11 at 03:59
  • @Mehrdad: GCC also emits debug info even with optimization enabled, so actually it's both possible and a good idea to debug with the same optimizations you're planning to release with. Then give up and reduce optimization when so-called "single-step" looks like an ongoing transporter malfunction. The issue in that kernel code was that from their POV, dereferencing a null pointer has *defined* behavior, that they were expecting to save their ass. It's just that the code they wrote didn't actually perform the dereference once GCC was done with it. The same could happen in user-mode. – Steve Jessop Feb 27 '11 at 04:03
  • But I'll certainly grant you, it was a rare and intriguing case, not really something very likely to ever ruin my day or yours. Ultimately you'd hope to find the error (use of a null pointer) eventually anyway because some test somewhere would fail. An immediate segfault in most circumstances is nice-to-have, rather than a key part of the security of your code... – Steve Jessop Feb 27 '11 at 04:07
  • @Steve: Just curious, why would anyone ever say `*p;`? (One reason I hate C/C++ is the fact that statements like this are even allowed in the first place... they encourage cryptic/dangerous code, for no good reason. In D, you'd have to say `cast(void)*p;` if you really want a statement like that.) – user541686 Feb 27 '11 at 04:09
  • @Mehrdad: just an example off the top of my head. The real code in the case of that kernel bug was longer and more complicated. IIRC it actually used the referand of the pointer, and then *later* tested it for null. If it had been null in the earlier use, then the kernel would have handled the segfault and resumed with the code, I forget exactly what was going on. So the later test was optimized away by assuming never null, since as far as GCC was concerned, every code path which reached the test first passed through the use of the pointer. But actually the later test needed to be performed. – Steve Jessop Feb 27 '11 at 04:13
  • Oh, or suddenly I think I remember maybe in the earlier use a null pointer *wouldn't* have segfaulted, because at that point the kernel had mapped in something accessible from address 0 up, so without optimization the code would just use that memory. But the check that the pointer value was non-null was needed later in the routine for security purposes. Sorry, can't remember the details. The principle though was just that the programmer thought they knew what the UB would be, but actually GCC did something crazier than they expected. Kernel mode made the effects of the bug worse. – Steve Jessop Feb 27 '11 at 04:17
  • @Steve: Hm, okay... I really hope the code that caused this wasn't unintuitive like `*p;`. Either way, though, for me personally this strategy has always been enough, and I've never really run across weird bugs like that. But if I go into kernel mode development then I'll try to change my practice. :) – user541686 Feb 27 '11 at 04:18
  • @Steve: o___o the kernel had mapped something accessible from address 0 up? When I wrote my own boot loader (and [nano]kernel) for practice in D, the *first* thing I did after turning on paging was making address zero inaccessible... and, in fact, the next thing I did was to also make addresses beyond the memory limit also inaccessible... – user541686 Feb 27 '11 at 04:20
1

assert should be considered a "run-time checked comment" indicating an assumption that the programmer makes at that moment. The assert is part of the function implementation. A failed assert should always be considered a bug at the point where the wrong assumption is made, so at the code location of the assert. To fix the bug, use a proper means to avoid the situation.

The proper means to avoid bad function inputs are contracts, so the example function should have a input contract that checks that range2 is at least as long as range1. The assertion inside the implementation could then still remain in place. Especially in longer more complex implementations, such an assert may inprove understandability.

An enforce is a lazy approach to throwing runtime exceptions. It is nice for quick-and-dirty code because it is better to have a check in there rather then silently ignoring the possibility of a bad condition. For production code, it should be replaced by a proper mechanism that throws a more meaningful exception.

0

I believe you have partly answered your question yourself. Assertions are bound to break the flow. If your assertion is wrong, you will not agree to continue with anything. If you enforce something you are making a decision to allow something to happen based on the situation. If you find that the conditions are not met, you can enforce that the entry to a particular section is denied.

Vinod R
  • 1,218
  • 10
  • 13
  • You break the flow either way (exception thrown either way). The question is, how do you decide whether to assert() or enforce()? Sometimes what's a precondition can be beyond the caller's control, but then again, it's a precondition, so I don't know what to do. – user541686 Feb 25 '11 at 17:42