4

I came up with this little code but all the professionals said its dangerous and I should not write code like this. Can anyone highlight its vulnerabilities in 'more' details?

int strlen(char *s){ 
    return (*s) ? 1 + strlen(s + 1) : 0; 
}
Bálint Juhász
  • 304
  • 1
  • 16
  • 2
    Why don't you ask all those professionals what they meant? It's probably less than perfectly efficient, but I wouldn't call it dangerous. Especially with a reasonably modern compiler that would likely optimize away tail recursion. – Igor Tandetnik Oct 07 '13 at 21:38
  • Some languages actually don't have loops (ML) – shoosh Oct 07 '13 at 21:38
  • 2
    @IgorTandetnik, it's not tail recursion. – zch Oct 07 '13 at 21:39
  • @IgorTandetnik but it's not tail recursive – john Oct 07 '13 at 21:39
  • Aside from what's already been mentioned, you can't pass a `const char*` to this function. – Collin Dauphinee Oct 07 '13 at 21:39
  • @IgorTandetnik, it's not, it does `1 +` after getting value from recursion. – zch Oct 07 '13 at 21:42
  • @IgorTandetnik: They're technically right, you do an add operation after the recursive call. But practically, the compiler should figure that out. – bitmask Oct 07 '13 at 21:42
  • 2
    @IgorTandetnik It is not tail recursive as written because works remains to be done after the call (adding one). It can be transformed into a tail-recursive function manually or automatically, but I wouldn't expect a C compiler to do it automatically. – Pascal Cuoq Oct 07 '13 at 21:42
  • @IgorTandetnik Sorry but you are wrong, from wikipedia 'In computer science, a tail call is a subroutine call that happens inside another procedure as its final action'. This is not a tail call because of the + 1 after recursively calling strlen. It's just normal recursion, not tail recursion. – john Oct 07 '13 at 21:44
  • On the other hand Kuba Ober's `strlen_impl` function in his answer is tail recursive. – john Oct 07 '13 at 21:48
  • 3
    There is also one little danger nobody mentioned. You should use `size_t`, not `int`. Otherwise you could fall into trouble with big strings on platform with small `int`. – zch Oct 07 '13 at 21:53
  • `all the professionals` Really? – Lightness Races in Orbit Oct 18 '13 at 19:04
  • I like your implementation :-) But don't use it ;-) – Julien Palard Mar 28 '15 at 22:36

4 Answers4

6

It has no vulnerabilities per se, this is perfectly correct code. It is prematurely pessimized, of course. It will run out of stack space for anything but the shortest strings, and its performance will suck due to recursive calls, but otherwise it's OK.

The tail call optimization most likely won't cope with such code. If you want to live dangerously and depend on tail-call optimizations, you should rephrase it to use the tail-call:

// note: size_t is an unsigned integertype

int strlen_impl(const char *s, size_t len) {
    if (*s == 0) return len;
    if (len + 1 < len) return len; // protect from overflows
    return strlen_impl(s+1, len+1);
}        

int strlen(const char *s) {
   return strlen_impl(s, 0);
}
Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • 5
    Both clang and gcc turn the code in the question into a loop when compiled with -O2. (GCC 4.8.1, Clang 3.3) – John Bartholomew Oct 07 '13 at 21:41
  • 2
    @JohnBartholomew: I'd be leery of depending even on the tail-call optimization, much less the conversion to a loop... Nothing in the standard guarantees any of it. – Kuba hasn't forgotten Monica Oct 07 '13 at 21:44
  • @KubaOber: You can be as leery as you like, but personally I'd rather trust the actual assembly output of the compiler than your gut feeling. – John Bartholomew Oct 07 '13 at 21:45
  • 1
    VS2012 turns it into a loop and inlines the function with optimizations on – shoosh Oct 07 '13 at 21:46
  • 1
    @JohnBartholomew: Well, if your goal in life is to code to a particular compiler at a particular optimization level, go right ahead :) – Kuba hasn't forgotten Monica Oct 07 '13 at 21:48
  • 2
    Basically, the issue is not in what some compilers will do, the issue is whether coding it like this is a good idea at all. Yeah, I'm sure there's some template metaprogramming that may produce such code and the compilers should deal with it, but this answer is at a newbie level, let's not add to the confusion. – Kuba hasn't forgotten Monica Oct 07 '13 at 21:49
  • 1
    @JohnBartholomew: Never mind that with link time code generation, you have to check the assembly output at each place this happens to be called :) – Kuba hasn't forgotten Monica Oct 07 '13 at 21:51
  • @KubaOber: I agree that it is not a good way of writing the function, and that you shouldn't code assuming a particular set of optimisations. However, in my opinion it is just as bad to code based on the assumption that a compiler *won't* optimise a piece of code in a release build. Several people have responded to this question claiming that compilers can't or won't optimise the code in the question: that is false and I think it's worth correcting that error. – John Bartholomew Oct 07 '13 at 21:51
  • @JohnBartholomew: Agreed. – Kuba hasn't forgotten Monica Oct 07 '13 at 21:52
  • 1
    Both the wrong type (`int` in place of `size_t`; `int` can overflow and invokes UB when it does) and the assumption that the compiler will convert the code into a loop make it very dangerous. – R.. GitHub STOP HELPING ICE Oct 07 '13 at 22:17
  • @R..: The assumption is not about any conversion into a loop, but about tail call optimization. – Kuba hasn't forgotten Monica Oct 07 '13 at 23:13
  • @KubaOber: As plenty of others have said, this is NOT a tail call. – R.. GitHub STOP HELPING ICE Oct 08 '13 at 03:49
  • @R.. As plenty of others have said, the compiler is still entitled to optimize this, and all three compilers tested (MSVC, gcc and clang) _do_ so. So-what if it's TCO. It's reality. – sehe Oct 18 '13 at 19:19
  • 1
    @sehe: Relying on an optional compiler optimization to avoid a stack overflow is dangerous, regardless of whether or not several mainstream compilers currently appear to do the optimization in one particular instance. Recursion is a great tool when the input size is bounded or when the algorithm itself reduces the stack requirement to something sub-linear (e.g., a binary search). – Adrian McCarthy Oct 18 '13 at 21:12
  • @AdrianMcCarthy I wasn't discussing the merits of the algorithm. I was trying to stop the useless debate fixating on TCO. Of course using recursion in a language that doesn't guarantee this is not the due course. – sehe Oct 18 '13 at 21:26
5

Dangerous it a bit of a stretch, but it is needlessly recursive and likely to be less efficient than the iterative alternative.

I suppose also that given a very long string there is a danger of a stack overflow.

john
  • 85,011
  • 4
  • 57
  • 81
  • I'd be very confident that a decent compiler would tail-recurse the heck out of this. But this would be subject to experiment. – bitmask Oct 07 '13 at 21:38
  • @bitmask I don't know what a compiler might do with it, but the function is not tail recursive. – john Oct 07 '13 at 21:41
5

There are two serious security bugs in this code:

  1. Use of int instead of size_t for the return type. As written, strings longer than INT_MAX will cause this function to invoke undefined behavior via integer overflow. In practice, this could lead to computing strlen(huge_string) as some small value like 1, malloc'ing the wrong amount of memory, and then performing strcpy into it, causing a buffer overflow.

  2. Unbounded recursion which can overflow the stack, i.e. Stack Overflow. :-) A compiler may choose to optimize the recursion into a loop (in this case, it's possible with current compiler technology), but there is no guarantee that it will. In a best case, stack overflow will simply crash the program. In a worst case (e.g. running on a thread with no guard page) it could clobber unrelated memory, possibly yielding arbitrary code execution.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
3

The problem with killing the stack that have been pointed out, ought to be fixed by a decent compiler, where the apparent recursive call is flattened into a loop. I verified this hypothesis and asked clang to translate your code:

//sl.c
unsigned sl(char const* s) {
  return (*s) ? (1+sl(s+1)) : 0;
}

Compiling and disassembling:

clang -emit-llvm -O1 -c sl.c -o sl.o
#                 ^^ Yes, O1 is already sufficient.
llvm-dis-3.2 sl.o

And this is the relevant part of the llvm result (sl.o.ll)

define i32 @sl(i8* nocapture %s) nounwind uwtable readonly {
  %1 = load i8* %s, align 1, !tbaa !0
  %2 = icmp eq i8 %1, 0
  br i1 %2, label %tailrecurse._crit_edge, label %tailrecurse

tailrecurse:                                      ; preds = %tailrecurse, %0
  %s.tr3 = phi i8* [ %3, %tailrecurse ], [ %s, %0 ]
  %accumulator.tr2 = phi i32 [ %4, %tailrecurse ], [ 0, %0 ]
  %3 = getelementptr inbounds i8* %s.tr3, i64 1
  %4 = add i32 %accumulator.tr2, 1
  %5 = load i8* %3, align 1, !tbaa !0
  %6 = icmp eq i8 %5, 0
  br i1 %6, label %tailrecurse._crit_edge, label %tailrecurse

tailrecurse._crit_edge:                           ; preds = %tailrecurse, %0
  %accumulator.tr.lcssa = phi i32 [ 0, %0 ], [ %4, %tailrecurse ]
  ret i32 %accumulator.tr.lcssa
}

I don't see a recursive call. Indeed clang called the looping label tailrecurse which gives us a pointer as to what clang is doing here.

So, finally (tl;dr) yes, this code is perfectly safe and a decent compiler with a decent flag will iron the recursion out.

bitmask
  • 32,434
  • 14
  • 99
  • 159
  • It works *if* you compile optimized, and *if* you use the compilers that recognize this as close to tail recursion. What about those organizations that prohibit the use of an optimizing compiler? (And yes, they do exist. A multimillion dollar error traced to a bad optimization, even if it happened a decade ago or more in the past, can result in very, very long institutional memories.) – David Hammen Oct 07 '13 at 22:00
  • @DavidHammen: This may sound strange, but I really don't care about organisations. Particularly not those with funny ideas like shipping binaries without the sources. But for your peace of mind, I rerun the test with `-O1` which is a much more careful optimisation and clang still removes the recursive call. – bitmask Oct 07 '13 at 22:07
  • @bitmask: If you were shipping binaries, you wouldn't be worried about it. It's when your users compile from source that you must ensure your source is correct. And yours is not. – R.. GitHub STOP HELPING ICE Oct 07 '13 at 22:18
  • @R..: Fair enough, but the only correctness issue I see is using `unsigned` instead of `size_t` which I did purposefully in order to avoid including additional headers, potentially bloating the resulting file. Btw., I'm crappy at reading actual assembler so I only checked clang (it generates llvm), not gcc or others. If gcc doesn't also figure out the tail recursion it'd be one more reason against it but also support your point. – bitmask Oct 07 '13 at 22:33