3

When compiling a 64bit application, why does strlen() return a 64-bit integer? Am i missing somthing?

I understand that strlen() returns a size_t type, and by definition this shouldn’t change, but... Why would strlen need to return a 64-bit integer?

The function is designed to be used with strings. With that said:

Do programmers commonly create multi-gigabyte or multi-terabyte strings? If they did, wouldn’t they need a better way to determine the string length than searching for a NULL character?

I think this is ridiculous, in fact, maybe we need a StrLenAsync() function with a callback just to handle the ultra long process for searching for the NULL in the 40TB string. Sound stupid? Yea, well strlen() returns a 64-bit integer!

Of course, the proposed StrLenAsync() function is a joke.

Steven Sudit
  • 19,391
  • 1
  • 51
  • 53
NTDLS
  • 4,757
  • 4
  • 44
  • 70
  • 12
    What makes you think size_t doesn't change depending on platform? – Yacoby Jul 14 '09 at 16:13
  • 6
    Zero-terminated strings are stupid anyway, so why care? ;) – OregonGhost Jul 14 '09 at 16:14
  • 6
    @NTDLS: on a 64-bit platform there is no real overhead in returning a 64-bit integer since it fits in a single register. (Assuming a register is used for the return value which is the case on most platforms I've seen). – Evan Teran Jul 14 '09 at 16:17
  • So strlen() returns size_t. Do you have a problem with size_t being 64-bit on a 64-bit platform or should strlen() return some special funkystringsize_t? – Gleb Jul 14 '09 at 16:21
  • 2
    If this bothers you so much, use std::string. No more searching for NULLs in your 40TB string. – Mark Ransom Jul 14 '09 at 16:58
  • 3
    It's not a problem, since I only allocate my 40TB strings on machines that can execute an infinite loop in under 3 seconds. – Steven Sudit Jul 14 '09 at 23:30
  • Most importantly, strlen returns an _unsigned_ 64 bit integer ;) size_t is going to be the largest unsigned integer type according to the architecture. – Tim Post Mar 08 '10 at 03:32

7 Answers7

18

It looks like, when compiling for a 64-bit target, size_t is defined as 64-bit. This makes sense, since size_t is used for sizes of all kinds of objects, not just strings.

Steven Sudit
  • 19,391
  • 1
  • 51
  • 53
  • 1
    Totally understood, but isnt that alot of overhead for a fucntion which will likely never see a return value over the max 32-bit unsigned integer? – NTDLS Jul 14 '09 at 16:13
  • 2
    For the difference between two pointers to be exact. And strlen is just that. – Marco van de Voort Jul 14 '09 at 16:13
  • 17
    That's a bit like saying that a 32-bit size_t has 16 bits of overhead because most strings are well under 64k. :-) – Steven Sudit Jul 14 '09 at 16:16
  • That's like saying 'bool' has at least 1 bit(s) of overhead, because strings are often empty. -- sorry, couldn't resist – Aaron Jul 14 '09 at 16:50
  • Well, more like saying bool has 7 bits of overhead because the expense of bitpacking with ands and ors exceeds the space savings. – Steven Sudit Jul 14 '09 at 16:58
  • 9
    size_t is 64 bits on 64-bit machines. On such machines, there is no overhead, because a register is 64 bits, and usually the return value is in a register. It would actually be a waste to define size_t as 32 bits. – Jared Oberhaus Jul 14 '09 at 17:00
  • @NTDLS: Writing code that only behaves well in situations someone thinks are "likely" is a frequent cause of bugs - and worse yet, security flaws. Even after it's been exploited countless times, a lot of people still think it's "unlikely" that someone would type raw SQL into a browser URL and swipe a database full of credit card numbers. – Bob Murphy Oct 15 '09 at 04:08
  • @Jared: Well, there's no overhead in the register, but if it winds up in a variable then it will use twice as much RAM. Is this an issue? On the one hand, it means eating through the cache sooner. On the other, 64-bit processors can handle tons of RAM. I'd call it a wash. – Steven Sudit Oct 15 '09 at 13:50
  • 4
    For the difference between two pointers (which can be negative), `ptrdiff_t` is used. `size_t` is used for non-negative things. – Johannes Schaub - litb Feb 18 '10 at 20:49
8

On a 64-bit app, it's definitely possible to create a 5GB string.

The spec is not intended to keep you from doing stupid things.

Even if it wasn't needed, it wouldn't be worth changing the specification of strlen away from using a size_t just to make the return value 4 instead of 8 bytes.

Tim Sylvester
  • 22,897
  • 2
  • 80
  • 94
  • 2
    It's also possible in a 32 bit app to create a 5GB string. It just can't be mapped into 32 bit address space at once, so strlen would have to be kind of clever about this, which it isn't. See the following interesting article for details: http://blogs.msdn.com/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx – OregonGhost Jul 14 '09 at 16:17
  • 4
    The strlen function operates on a pointer, assuming that the string follows in contiguous memory. A 32-bit pointer cannot represent a string larger than 4G (minus whatever space the O/S reserves) *in memory*. While there are certainly several ways to represent strings larger than the address space, they are irrelevant to strlen because of the assumptions built into its specification. – Tim Sylvester Jul 14 '09 at 16:59
7

Here's a chart which shows the size of some basic types in the most common datamodels:

         LP32 ILP32 LP64 LLP64 ILP64
char       8    8     8     8     8
short     16   16    16    16    16
int       16   32    32    32    64
long      32   32    64    32    64
long long 64   64    64    64    64
pointer   32   32    64    64    64
size_t    32   32    64    64    64

The 32-bit Windows datamodel is ILP32 and the 64-bit Windows datamodel is LLP64. (The Windows 3.1 and Macintosh datamodel were both LP32.)

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Nick
  • 761
  • 4
  • 9
  • I hoped it might be helpful. I'm involved in porting a very large C++ codebase to 64-bit at the moment, so I'm living and breathing this stuff right now. – Nick Jul 14 '09 at 17:07
  • Yea, very nice chart. I saved a copy. – NTDLS Jul 15 '09 at 02:21
3

I can think of several applications where a string of 4GB is simply not enough (computational biology, computer forensics are two HUGE ones). Don't assume that because YOU don't do it that nobody else does, either.

San Jacinto
  • 8,774
  • 5
  • 43
  • 58
  • Ohh no, I totally understand that. I'm just saying that you wouldnt want to pass that 4GB+ array of characters to a strlen() function. You just *might* be better off keeping track of its length while your building it. – NTDLS Jul 15 '09 at 02:23
  • 2
    We don't use 4GB strings in computer forensics. That would be silly. – vy32 Feb 18 '10 at 07:38
  • You've never indexed an entire hard drive for later examination? How about when a cell phone is taken from a scene? It's easier to index the contents of the SD card than to keep reading from the card over and over. If you are referring to using strlen() to find the length of a 4GB string, then yes that is silly. Otherwise, I don't think I'm the one being silly here... – San Jacinto Feb 18 '10 at 13:19
2

It's not about whether anybody will actually make a string that size. By convention, ALL return types that indicate the number of bytes something occupies in memory are size_t.

Larry Gritz
  • 13,331
  • 5
  • 42
  • 42
1

Well, 1) size_t is a typedef and varies with architectures and 2) Wouldn't it make sense to have the largest integer as a return value? Why 32 bits? Why not 16? It's 64 on your machine because that's the max string length possible.

Tyler
  • 4,679
  • 12
  • 41
  • 60
0

strlen() have to use return type that can represent the size of the largest object in the allocation model.

You could use std::string. Its size_type is equal to the allocator's size_type. So if you will create your own allocator then std::string::size() could use even char as return type.

Thanks to remark in comments. std::string is just a specialization of the std::basic_string. Sure you should use std::basic_string with custom allocator.

Kirill V. Lyadvinsky
  • 97,037
  • 24
  • 136
  • 212
  • 1
    You can't (in standard C++) change the allocator for std::string: it's a typedef, not a template. You have to use basic_string. – Steve Jessop Jul 14 '09 at 17:22