2

All, I don't know what limit I have hit or whether this is an issue with valgrind, libc, or me, but I need to know if this is reproducible, and if so, where this issue lies. I've boiled the issue down to an MCVE producable on 2 of my AMD boxes. Basically, I am dynamically allocating struct dirent * pointers and then allocating a struct dirent for each successful readdir. valgrind has no complaint for 1017, but then on number 1018, I get an invalid read error (no reallocation involved), e.g.

==9881== Invalid read of size 8
==9881==    at 0x4C2F316: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9881==    by 0x40098E: main (readdir_mcve.c:35)
==9881==  Address 0x51df070 is 0 bytes after a block of size 32,816 alloc'd
==9881==    at 0x4C2ABD0: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9881==    by 0x4EE93E3: __alloc_dir (in /usr/lib/libc-2.23.so)
==9881==    by 0x4EE94D2: opendir_tail (in /usr/lib/libc-2.23.so)
==9881==    by 0x400802: main (readdir_mcve.c:9)

(the block of size 32,816 looks curious, but I haven't found any help breaking it down)

The code takes the directory name to open as the first argument, and then the limit of files to read as the second (default is 1000):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <dirent.h>

int main (int argc, char **argv) {

    DIR *dp = opendir (argc > 1 ? argv[1] : "."); /* open directory (. default) */
    struct dirent *de = NULL, **dlist = NULL;     /* ptr and ptr2ptr to dirent  */
    size_t nptrs = argc > 2 ? (size_t)strtoul (argv[2], NULL, 10) : 1000,
           i = 0, idx = 0;                        /* index, allocation counter  */

    if (!dp) {
        fprintf (stderr, "error: opendir failed.\n");
        return 1;
    }

    /* allocate nptrs dirent pointers  */
    if (!(dlist = calloc (nptrs, sizeof *dlist))) {
        fprintf (stderr, "error: virtual memory exhausted - dlist\n");
        return 1;
    }

    while ((de = readdir (dp))) {

        /* skip dot files */
        if (!strcmp (de->d_name, ".") || !strcmp (de->d_name, ".."))
            continue;

        if (!(dlist[idx] = calloc (1, sizeof **dlist))) { /* alloc dirent */
            fprintf (stderr, "error: dlist memory allocation failed\n");
            return 1;
        }
        memcpy (dlist[idx++], de, sizeof *de);   /* copy de to dlist[idx] */

        if (idx == nptrs)   /* post-check/realloc, insures sentinel NULL */
            break;
    }
    closedir (dp);

    for (i = 0; i < idx; i++) {
        printf (" file[%3zu] : %s\n", i, dlist[i]->d_name);
        free (dlist[i]);
    }
    free (dlist);

    return 0;
}

You can create a simple test directory with:

$ mkdir readdir_tst
$ for i in {1..1024}; do
  printf -v fname "file%04d" "$i"
  touch "readdir_tst/$fname"
  done

Then all is well reading 1014 filenames (1017 allocations):

$ valgrind ./bin/readdir_mcve readdir_tst 1014 > /dev/null
==9880== Memcheck, a memory error detector
==9880== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==9880== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==9880== Command: ./bin/readdir_mcve readdir_tst 1014
==9880==
==9880==
==9880== HEAP SUMMARY:
==9880==     in use at exit: 0 bytes in 0 blocks
==9880==   total heap usage: 1,017 allocs, 1,017 frees, 328,944 bytes allocated
==9880==
==9880== All heap blocks were freed -- no leaks are possible
==9880==
==9880== For counts of detected and suppressed errors, rerun with: -v
==9880== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

But on file 1015 (allocation 1018) I hit an __alloc_dir issue and valgrind throws an Invalid read of size 8 ... is 0 bytes after a block of size 32,816 alloc'd:

$ valgrind ./bin/readdir_mcve readdir_tst 1015 > /dev/null
==9881== Memcheck, a memory error detector
==9881== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==9881== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==9881== Command: ./bin/readdir_mcve readdir_tst 1015
==9881==
==9881== Invalid read of size 8
==9881==    at 0x4C2F316: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9881==    by 0x40098E: main (readdir_mcve.c:35)
==9881==  Address 0x51df070 is 0 bytes after a block of size 32,816 alloc'd
==9881==    at 0x4C2ABD0: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9881==    by 0x4EE93E3: __alloc_dir (in /usr/lib/libc-2.23.so)
==9881==    by 0x4EE94D2: opendir_tail (in /usr/lib/libc-2.23.so)
==9881==    by 0x400802: main (readdir_mcve.c:9)
==9881==
==9881==
==9881== HEAP SUMMARY:
==9881==     in use at exit: 0 bytes in 0 blocks
==9881==   total heap usage: 1,018 allocs, 1,018 frees, 329,232 bytes allocated
==9881==
==9881== All heap blocks were freed -- no leaks are possible
==9881==
==9881== For counts of detected and suppressed errors, rerun with: -v
==9881== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

The code continues to read and print all directory entries just fine, but it is the valgrind error that has me baffled. I had it reallocating instead of calling break on reaching the allocation limit and it handles the 4000+ files in /usr/bin without any issue other than the valgrind error. (that was stripped as not relevant to the MCVE). The closest thing I've found on SO is Valgrind malloc leaks, but that wasn't applicable here. Can anyone else reproduce this and if so, is this valgrind, libc or me?

note: I get the same results with libc-2.18.so.


GNU libc dirent.h offers a bit more information

After being pointed in the right direction by the answer and continuing the search a bit, it seems there are a number of ways libc may determine the length of d_name. It will depend on various definitions available to the compiler. In dirent.h explains:

46 /* This file defines `struct dirent'.
47 
48    It defines the macro `_DIRENT_HAVE_D_NAMLEN' iff there is a `d_namlen'
49    member that gives the length of `d_name'.
...
59 */
...
67 /* These macros extract size information from a `struct dirent *'.
68    They may evaluate their argument multiple times, so it must not
69    have side effects.  Each of these may involve a relatively costly
70    call to `strlen' on some systems, so these values should be cached.
71 
72    _D_EXACT_NAMLEN (DP) returns the length of DP->d_name, not including
73    its terminating null character.
74 
75    _D_ALLOC_NAMLEN (DP) returns a size at least (_D_EXACT_NAMLEN (DP) + 1);
76    that is, the allocation size needed to hold the DP->d_name string.
77    Use this macro when you don't need the exact length, just an upper bound.
78    This macro is less likely to require calling `strlen' than _D_EXACT_NAMLEN.
79    */
80 
81 #ifdef _DIRENT_HAVE_D_NAMLEN
82 # define _D_EXACT_NAMLEN(d) ((d)->d_namlen)
83 # define _D_ALLOC_NAMLEN(d) (_D_EXACT_NAMLEN (d) + 1)
84 #else
85 # define _D_EXACT_NAMLEN(d) (strlen ((d)->d_name))
86 # ifdef _DIRENT_HAVE_D_RECLEN
87 #  define _D_ALLOC_NAMLEN(d) (((char *) (d) + (d)->d_reclen) - &(d)->d_name[0])
88 # else
89 #  define _D_ALLOC_NAMLEN(d) (sizeof (d)->d_name > 1 ? sizeof (d)->d_name : \
90                               _D_EXACT_NAMLEN (d) + 1)
91 # endif
92 #endif
...

While there a number of different compile paths this can take based on the various defines, the one that is the most readable occurs if __USE_XOPEN2K8 is set:

221 #ifdef __USE_XOPEN2K8
222 ...
230 # ifdef __USE_MISC
231 #  ifndef MAXNAMLEN
232 /* Get the definitions of the POSIX.1 limits.  */
233 #  include <bits/posix1_lim.h>
234 
235 /* `MAXNAMLEN' is the BSD name for what POSIX calls `NAME_MAX'.  */
236 #   ifdef NAME_MAX
237 #    define MAXNAMLEN   NAME_MAX
238 #   else
239 #    define MAXNAMLEN   255
240 #   endif
241 #  endif
242 # endif

So in this case, d_name is either NAME_MAX or 255 depending on the NAME_MAX definition (and thus set to 256 by the _D_ALLOC_NAMLEN (DP) macro). Thanks to unwind for pointing me in the right direction. I don't know if we can ever know the exact answer of why 1017 struct dirent allocations occur without issue and why valgrind begins to complain on number 1018, but at least we understand now where the source of the issue arises and why copying a struct dirent with memcpy may pose problems.

Community
  • 1
  • 1
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • What arguments are you passing to the program? – 2501 Jun 30 '16 at 08:17
  • Now that is something I'll have to research. I'm only opening the one directory, but I can see the issue if each successive `readdir` is holding a handle open?? I don't know the answer to that and will have to dig further. – David C. Rankin Jun 30 '16 at 08:18
  • @2501 `./bin/readdir_mcve readdir_tst 1014` (works) `./bin/readdir_mcve readdir_tst 1015` (first `valgrind` error) – David C. Rankin Jun 30 '16 at 08:19
  • 1
    Well no, `closedir` follows `opendir`, will all the `readdir`s in between you need. – David C. Rankin Jun 30 '16 at 08:20
  • What worries me is that it is something weird with the `struct direct` allocations given the **Note:** on `Only the fields d_name and d_ino are specified in POSIX.1-2001` and not knowing how `valgrind` is checking dealing with the differences between implementation that may leave something dangling.., but that is just a SWAG (swinging wild ass guess) – David C. Rankin Jun 30 '16 at 08:22
  • What happens if you assign the structs instead: `*dlist[idx] = *de`? Or copy the struct member by member? – 2501 Jun 30 '16 at 08:26
  • Same thing, I tried `:(` I've been playing with this for several hours, searching the web, pulling my hair out --- with no answers. That's when I know to turn it over to the folks smarter than I here -- collaboration `:)` And it would be `dlist[idx++] = de;` to let the struct assignment op do the copy. That's handy since all the `dirent` members were `static`, but no joy. There is no way anything could overlap, but I tried `memmove` too. – David C. Rankin Jun 30 '16 at 08:27
  • `dlist[idx++] = de` would only copy the pointer, you need to copy the entire struct. – 2501 Jun 30 '16 at 08:32
  • That's what I'll try next. I'll report back if anything changes, but I'm betting against it. It would mean there was something wrong with the assignment operator, memcpy and memmove. – David C. Rankin Jun 30 '16 at 08:33
  • Fwiw: I can reproduce the issue, with libc-2.19 and valgrind 3.10.1. –  Jun 30 '16 at 08:37
  • `$ ulimit unlimited`. @2501, you are correct on the pointer, it's getting late - I tried the assignment with `*dlist[idx++] = *de;` same issue. Evert *thank you!*, I'm not going crazy. – David C. Rankin Jun 30 '16 at 08:40

1 Answers1

5

You can't copy strucft dirent like that, it seems the manual page and the code are not in sync.

Here is the current declaration:

struct dirent
  {
#ifndef __USE_FILE_OFFSET64
    __ino_t d_ino;      /* File serial number.  */
#else
    __ino64_t d_ino;
#endif
    unsigned short int d_reclen; /* Length of the whole `struct dirent'.  */
    unsigned char d_type;   /* File type, possibly unknown.  */
    unsigned char d_namlen; /* Length of the file name.  */

    /* Only this member is in the POSIX standard.  */
    char d_name[1];     /* File name (actually longer).  */
  };

Clearly since d_name is declared as [1], you won't get the proper size using sizeof. You need to do more clever storage, i.e. strdup() the names or something (if you're only interested in the names).

I'm not 100% sure why this causes the breakage, but I bet you're seeing UB of some kind (notice you're dying on a read, if you reached printf() on your copied strings that would trigger UB).

unwind
  • 391,730
  • 64
  • 469
  • 606
  • That's what I was looking for. The `man page` shows `d_name[256]` and I was relying on the `sizeof (struct dirent)` which was returning `280`. Thank you for helping solve this mystery. And yes, I do have `dlist` as `char **` in the other code that was working, but I was going to figure out why `dirent` was throwing the erorr -- or die trying `:)` – David C. Rankin Jun 30 '16 at 08:55
  • 280? That's ... way larger than I would have expected from the declaration. Strange. – unwind Jun 30 '16 at 09:22
  • @DavidC.Rankin What does `sizeof(->d_name)` say? – 2501 Jun 30 '16 at 09:38
  • Maybe somehow OP changed the alignment? – LPs Jun 30 '16 at 10:27
  • [This example on ideone.com](http://ideone.com/BdA8Gw) prints 256 (this comment was re-posted, I failed to include the link originally). I'm confused. – unwind Jun 30 '16 at 10:31
  • @2501 sorry, had depos today, I get `sizeof *de : 280` and `sizeof (de->d_name) : 256`. I haven't torn the source open yet, the the notes warn that `d_name` *may* be statically allocated and overwritten. Gotta love the certainty *may* be instead of *is*. So much for `FILENAME_MAX`... – David C. Rankin Jul 01 '16 at 04:15
  • @DavidC.Rankin I'd love to look at the source. Any public links? – 2501 Jul 01 '16 at 08:57
  • GLibC hosts its code at [**https://sourceware.org/git/?p=glibc.git;a=tree;h=refs/heads/master;hb=refs/heads/master**](https://sourceware.org/git/?p=glibc.git;a=tree;h=refs/heads/master;hb=refs/heads/master)I may have a little time tonight to try and track down where the `d_name[1] -> d_name[256]` magic is happening. I wonder if they are just assigning the address of a static buffer? The bigger issue is still why 1017 copies are OK, but valgrind complains about 1018 -- and why it continues to work fine even with the valgrind complaint. – David C. Rankin Jul 02 '16 at 00:35