7

I've read and wondered about the source code of sqlite

static int strlen30(const char *z){
  const char *z2 = z;
  while( *z2 ){ z2++; }
  return 0x3fffffff & (int)(z2 - z);
}

Why use strlen30() instead of strlen() (in string.h)??

phuclv
  • 37,963
  • 15
  • 156
  • 475
hority
  • 228
  • 1
  • 8
  • Unfortunately, SQLite sources say only the obvious thing - "Compute a string length that is limited to what can be stored in lower 30 bits of a 32-bit signed integer.". – Rafał Rawicki Jul 27 '11 at 09:40
  • Maybe other parts of sqlite can't cope with strings larger than 1073741823 bytes -- and assuming they're smaller is the solution (I don't buy this). – pmg Jul 27 '11 at 09:42

3 Answers3

3

The commit message that went in with this change states:

[793aaebd8024896c] part of check-in [c872d55493] Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007) (user: drh branch: trunk)

Jeff Foster
  • 43,770
  • 11
  • 86
  • 103
  • 4
    The commit message is horrible. How could cropping with `&` help with overflowing an integer I wonder? – sharptooth Jul 27 '11 at 09:53
  • @jeff Thank you for your answer! I'd like to read source code once again with your answer in mind. If there are some mistakes in English, I'd like to apologize. – hority Jul 27 '11 at 09:54
  • 1
    @hority absolutely zero problems with your English so no need to apologise! – Jeff Foster Jul 27 '11 at 10:00
  • @sharptooth: Because only unsigned integers have well defined overflow behaviour. It isn't well-defined for signed integers. Also, it is not defined whether the difference of two char-pointers has the same byte-size as an ordinary integer. The commit message is clear and concise, but indeed misses a pointer to a detailed feature request or similar. – Sebastian Mach Jul 27 '11 at 10:18
  • @phresnel: Could you please explain this in form of a detailed answer? It's really interesting and something I don't get. – sharptooth Jul 27 '11 at 10:35
  • @sharptooth: I've tried to give an answer in [your new question](http://stackoverflow.com/questions/6842880/why-reimplement-strlen-as-loopsubtraction/6843474#6843474) :) – Sebastian Mach Jul 27 '11 at 11:23
2

(this is my answer from Why reimplement strlen as loop+subtraction? , but it was closed)


I can't tell you the reason why they had to re-implement it, and why they chose int instead if size_t as the return type. But about the function:

/*
 ** Compute a string length that is limited to what can be stored in
 ** lower 30 bits of a 32-bit signed integer.
 */
static int strlen30(const char *z){
    const char *z2 = z;
    while( *z2 ){ z2++; }
    return 0x3fffffff & (int)(z2 - z);
}



Standard References

The standard says in (ISO/IEC 14882:2003(E)) 3.9.1 Fundamental Types, 4.:

Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer. 41)

...

41): This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type

That part of the standard does not define overflow-behaviour for signed integers. If we look at 5. Expressions, 5.:

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression (5.19), in which case the program is ill-formed. [Note: most existing implementations of C + + ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. ]

So far for overflow.

As for subtracting two pointers to array elements, 5.7 Additive operators, 6.:

When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as ptrdiff_t in the cstddef header (18.1). [...]

Looking at 18.1:

The contents are the same as the Standard C library header stddef.h

So let's look at the C standard (I only have a copy of C99, though), 7.17 Common Definitions :

  1. The types used for size_t and ptrdiff_t should not have an integer conversion rank greater than that of signed long int unless the implementation supports objects large enough to make this necessary.

No further guarantee made about ptrdiff_t. Then, Annex E (still in ISO/IEC 9899:TC2) gives the minimum magnitude for signed long int, but not a maximum:

#define LONG_MAX +2147483647

Now what are the maxima for int, the return type for sqlite - strlen30()? Let's skip the C++ quotation that forwards us to the C-standard once again, and we'll see in C99, Annex E, the minimum maximum for int:

#define INT_MAX +32767



Summary

  1. Usually, ptrdiff_t is not bigger than signed long, which is not smaller than 32bits.
  2. int is just defined to be at least 16bits long.
  3. Therefore, subtracting two pointers may give a result that does not fit into the int of your platform.
  4. We remember from above that for signed types, a result that does not fit yields undefined behaviour.
  5. strlen30 does applies a bitwise or upon the pointer-subtract-result:

          | 32 bit                         |
ptr_diff  |10111101111110011110111110011111| // could be even larger
&         |00111111111111111111111111111111| // == 3FFFFFFF<sub>16</sub>
          ----------------------------------
=         |00111101111110011110111110011111| // truncated

That prevents undefiend behaviour by truncation of the pointer-subtraction result to a maximum value of 3FFFFFFF16 = 107374182310.

I am not sure about why they chose exactly that value, because on most machines, only the most significant bit tells the signedness. It could have made sense versus the standard to choose the minimum INT_MAX, but 1073741823 is indeed slightly strange without knowing more details (though it of course perfectly does what the comment above their function says: truncate to 30bits and prevent overflow).

Community
  • 1
  • 1
Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
  • Thank you for your so detailly answer! I'm glad for your answer of my trivial question.I thought that "overflow" may be eternal problem for programing language... – hority Jul 27 '11 at 16:06
  • 1
    I would guess they chose that upper bound to allow certain type of integer math to be performed with the results without incurring the Wrath of UB, but I would regard their logic as somewhat faulty, in that it would be legitimate for a 64-bit machine where individual objects were limited to less than 4 gigs to define `size_t` as `uint32_t` and`ptrdiff_t` as `int32_t`, and do anything it likes when subtracting the pointer near the end of a 3-gig object from one to the beginning. – supercat Jul 06 '15 at 23:21
1

The CVS commit message says:

Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007)

I couldn't find any further reference to this commit or explanation how they got an overflow in that place. I believe that it was an error reported by some static code analysis tool.

Rafał Rawicki
  • 22,324
  • 5
  • 59
  • 79
  • The reason is simple - `size_t` doesn't fit into `int` on 32-bit systems. So they crop most significant bits. I can't see how this help with overflow - cropping is cropping whether you call it that or not. – sharptooth Jul 27 '11 at 09:59
  • 2 billion symbols is enough actually, no need for terabytes. – sharptooth Jul 27 '11 at 10:05
  • @Rafał Thank you for your answer! I learned that I have to read the CVS commit message ,before I post the question... thx!(^o^)/ – hority Jul 27 '11 at 10:08