When mmap()ing a text file, like so
int fd = open("file.txt", O_RDWR);
fstat(fd, &sb)
char *text = mmap(0, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
the file contents are mapped into memory directly, and text
it will not contain a NUL-terminator so operating on it with normal string functions would not be safe. On Linux (at least) the remaining bytes of the unused page are zero-filled, so effectively you get a NUL terminator in all cases where the file size isn't a multiple of the page size.
But relying on that feels dirty and other mmap()
implementations (e.g., in FreeBSD, I think) don't zero-fill partial pages. Mapping files that are multiples of the page size will also lack the NUL terminator.
Are there reasonable ways to work around this or to add the NUL terminator?
Things I've considered
- Using
strn*()
functions exclusively and tracking distance to the end of the buffer.- Pros: No need for NUL terminator
- Cons: Extra tracking required to know distance to end of file when parsing text; some
str*()
functions don't havestrn*()
counterpart, likestrstr
.
- As another answer suggested, make a anonymous mapping at a fixed address following the mapping of your text file.
- Pros: Can use regular C
str*()
functions - Cons: Using
MAP_FIXED
is not thread-safe; Seems like an awful hack anyway
- Pros: Can use regular C
mmap()
an extra byte and make the map writeable, and write the NUL terminator. The OpenGroup's mmap man page says you can make a mapping larger than your object's size but that accessing data outside of the actual mapped object will generate aSIGBUS
.- Pros: Can use regular C
str*()
functions - Cons: Requires handling (ignoring?)
SIGBUS
, which could potentially mean something else happened. I'm not actually sure writing the NUL terminator will work?
- Pros: Can use regular C
- Expand files with sizes that are multiples of page size with
ftruncate()
by one byte.- Pros: Can use regular C
str*()
functions;ftruncate()
will write a NUL byte to the newly allocated area for you - Cons: Means we have to write to the files, which may not be possible or acceptable in all cases; Doesn't solve problem for
mmap()
implementations that don't zero-fill partial pages
- Pros: Can use regular C
- Just
read()
the file into somemalloc()
'd memory and forget aboutmmap()
- Pros: Avoids all of these solutions; Easy to
malloc()
and extra byte for NUL - Cons: Different performance characteristics than
mmap()
- Pros: Avoids all of these solutions; Easy to
Solution #1 seems generally the best, and just requires a some extra work on the part of the functions reading the text.
Are there better alternatives, or are these the best solutions? Are there aspects of these solutions I haven't considered that makes them more or less attractive?