12

In C, the memmem function is used to locate a particular sequence of bytes in a memory area. It can be assimilated to strstr, which is dedicated to null-terminated strings.

Is there any particular reason for this function to be available as a GNU extension, and not directly in the standard libraries? The manual states :

This function was broken in Linux libraries up to and including libc 5.0.9; there the needle and haystack arguments were interchanged, and a pointer to the end of the first occurrence of needle was returned.

Both old and new libc's have the bug that if needle is empty, haystack-1 (instead of haystack) is returned. And glibc 2.0 makes it worse, returning a pointer to the last byte of haystack. This is fixed in glibc 2.1.

I can see it went through several fixes, yet I'd like to know why it was not made as directly available (if not more) as strstr on some distributions. Does it still bring up implementation issues?

Edit (motivations): I wouldn't ask this question if the standard had decided it the other way around: including memmem but not strstr. Indeed, strstr could be something like:

memmem(str, strlen(str), "search", 6);

Slightly trickier, but still a pretty logical one-liner considering that it is very usual in C functions to require both the data chunk and its length.

Edit (2): another motivation from comments and answers. Quoting Theolodis:

Not every function is necessary to every single, or at least most of the C developers, so it would actually make the standard libraries unnecessarily huge.

Well, I couldn't agree more, I'm always in when it comes to making the librairies lighter and faster. But then... why both strncpy and memcpy (from keltar's comment)...? I could almost ask: why has poor memmem been "black-sheeped"?

Community
  • 1
  • 1
John WH Smith
  • 2,743
  • 1
  • 21
  • 31
  • 3
    I am tempted to say: "Because it was not added to the standard." – Theolodis Jun 17 '14 at 12:21
  • 1
    I've been though enough books and references about UNIX & Linux to know that things don't get forgotten for just no reason. After all, this function **has** been documented as bugged, and given attention to. I'm curious about *why* it's not *just the usual function*. After all, given the needle/haystack lengths, the NULL termination becomes pointless, yet `strstr` is standard. – John WH Smith Jun 17 '14 at 12:22
  • Well, there are many things that missing in standard. It just cannot include everything. Even glibc, which is quite huge and goes far beyound mere C standard, lacks some very handy BSD functions like strlcpy, leaving us with monsterous strncpy instead. – keltar Jun 17 '14 at 13:09
  • @keltar: should I dare ask why we have both `strncpy` and `memcpy` then? (just kidding, don't you dare answer that) – John WH Smith Jun 17 '14 at 13:12
  • @JohnWHSmith either because they were submitted to standard review earlier, or because they had some strong background beneath them (rationale or just people who submitted them). Standards does not grabbing everything they want; in fact, many people would like to discard as many proposals as they can, because in other case process will never stop. At some point you have to stop, or there would be no standard in next N years (where N is decades, at least). – keltar Jun 17 '14 at 13:17
  • *I've been though enough books and references about UNIX & Linux to know that things don't get forgotten for just no reason.* -- Sure they do. – Keith Thompson Jun 17 '14 at 14:17
  • Note that the "libc 5.0.9" referred to in the man page is apparently an old Linux-specific C library; version 5.0.9 was released in 1995. It's been superseded by glibc, the GNU libc implementation. – Keith Thompson Jun 17 '14 at 14:26

1 Answers1

4

Historically, that is before the first revision of the Standard, C has been made by compiler writers.

In the case of strstr, it is a little bit different because it has been introduced by the C Committee, the C89 Rationale document tells us that:

"The strstr function is an invention of the Committee. It is included as a hook for efficient algorithms, or for built-in substring instruction."

The C Committee does not explain why it has not made a more general function not limited to strings so any reasoning may only be speculation. My only guess is the use case has been considered not important enough to have a generic memmem instead of strstr. Remember that in the goals of C there is this requirement (in the C99 Rationale) "Keep the language small and simple". Also even POSIX didn't consider it for inclusion.

In any case to my knowledge nobody has proposed any Defect Report or proposal to have memmem included.

ouah
  • 142,963
  • 15
  • 272
  • 331
  • 1
    Just to make sure : in this quote from the C99 Rationale, do you hear "simple" as "simple for core/language developers" or as "simple for users, i.e. C applications developers" ? I would find `memmem` slightly easier to program than `strstr` (though that may be just me), so the latter would indeed make sense (`strstr` being more common and intuitive). – John WH Smith Jun 17 '14 at 13:50