-1

For example, if the string is "Only for Geeky People" and I am looking for only "Geek" substring not "Geeky", it would say the word is not present.

ie strstr("Only for Geeky People", "Geek") would be NULL.

How do I address such an issue?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278

1 Answers1

1

You have to deal with it by wrapping strstr() in a function, perhaps str_word() (which avoids reserved names), that does extra checking after finding the word. Or, at least, that is probably the most sensible way to deal with it.

Padding the search string with spaces won't work. Leading padding would prevent the code from finding "Geek" or "(Geek is not pejorative)"; trailing padding would prevent it from finding "Ozymandias is a Geek". Etc. If you want to go OTT, you could consider going for a powerful regular expression library like PCRE, but it is overkill for this task (and the POSIX <regex.h> isn't sufficiently powerful — it doesn't recognize word boundaries).

char *str_word(char *haystack, const char *needle)
{
    char *from = haystack;
    size_t length = strlen(needle);
    char *found;
    while ((found = strstr(from, needle)) != NULL)
    {
        if (found > haystack && isalpha((unsigned char)found[-1]))
            from += length;
        else if (isalpha((unsigned char)found[length]))
            from += length;
        else
            return found;
    }
    return NULL;
}

Note that this allows the function to find the Geek in "Ozymandias is such a Geeky Geek".

Beware trying to add const-correctness to this. You can use this easily enough:

const char *str_word(const char *haystack, const char *needle);

However, you can't return a non-const char * when passed a const char * without a cast that removes the const-ness somewhere along the line. Returning a const char * punts the process of removing const-ness to the calling code. This matters in a context such as:

char *word = str_word(line, "Geek");

You have a variable array containing a line of input; you want to search for the word in that line, and get a non-const pointer back.

Test code:

#include <ctype.h>
#include <stdio.h>
#include <string.h>

extern char *str_word(char *haystack, const char *needle);

char *str_word(char *haystack, const char *needle)
{
    char *from = haystack;
    size_t length = strlen(needle);
    char *found;
    while ((found = strstr(from, needle)) != NULL)
    {
        if (found > haystack && isalpha((unsigned char)found[-1]))
            from += length;
        else if (isalpha((unsigned char)found[length]))
            from += length;
        else
            return found;
    }
    return NULL;
}

int main(void)
{
    const char search[] = "Geek";
    char haystacks[][64] =
    {
        "Geek",
        "(Geek is not pejorative)",
        "Ozymandias is a Geek",
        "Ozymandias is such a Geeky Geek",
        "No prizes for Geekiness",
        "Only for Geeky people",
        "Howling 'Geek' gets you nowhere",
        "A Geek is a human",
        "Geeky people run the tech world",
    };
    enum { NUM_HAYSTACKS = sizeof(haystacks) / sizeof(haystacks[0]) };

    for (int i = 0; i < NUM_HAYSTACKS; i++)
    {
        char *word = str_word(haystacks[i], search);
        if (word == NULL)
            printf("Did not find '%s' in [%s]\n", search, haystacks[i]);
        else
            printf("Found '%s' at [%s] in [%s]\n", search, word, haystacks[i]);
    }

    return 0;
}

Test results:

Found 'Geek' at [Geek] in [Geek]
Found 'Geek' at [Geek is not pejorative)] in [(Geek is not pejorative)]
Found 'Geek' at [Geek] in [Ozymandias is a Geek]
Found 'Geek' at [Geek] in [Ozymandias is such a Geeky Geek]
Did not find 'Geek' in [No prizes for Geekiness]
Did not find 'Geek' in [Only for Geeky people]
Found 'Geek' at [Geek' gets you nowhere] in [Howling 'Geek' gets you nowhere]
Found 'Geek' at [Geek is a human] in [A Geek is a human]
Did not find 'Geek' in [Geeky people run the tech world]
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • It might be sensible to add an assertion such as `assert(length > 0 && isalpha((unsigned char)needle[0]) && isalpha((unsigned char)needle[length - 1]);` to the start of the function. If those conditions aren't met, the result is likely to be — well, if not erroneous, at least unexpected. You could get inventive by passing a function pointer too; you could then look for a number with `str_match("01 011010 01", "011010", isdigit)` as well as `str_match("Ozymandias is a Geeky Geek", "Geek", isalpha)`. Have fun. – Jonathan Leffler May 11 '20 at 16:11
  • Could you please explain me what this piece of code does? if (found > haystack && isalpha((unsigned char)found[-1])) from += length; What does found[-1] mean?? Is it valid in C. I see such syntax in Python. to refer to character in reverse order. – Poornima M May 11 '20 at 17:05
  • The argument names assume you're familiar with the phrase "searching for a needle in a haystack". The test `found > haystack` checks that the needle was not found at the beginning of the 'haystack' — that is, the string being searched. This ensures that indexing `found[-1]` is referencing part of the string, and not attempting to reference before the beginning of the string. Assuming that we're not at the beginning of the haystack, `isalpha((unsigned char)found[-1])` checks whether the character before the match is a letter; if it is, then have something like `"aGeek"`. _[…continued…]_ – Jonathan Leffler May 11 '20 at 17:12
  • _[…continuation…]_ If the character before (the occurrence of) the needle in the haystack is a letter, then this isn't the word being looked for and the code jumps past the mismatch. The `from += length` skips over the entire match so that the "Geek" in "Geeky Geek" can be found next time. If the character before the needle in the haystack is not a letter, the code checks whether the character after the needle is a letter; if it is, then it jumps past the mismatch. If neither the character (if any) before nor the character after the needle is a letter, the word was found. _[…continued 2…]_ – Jonathan Leffler May 11 '20 at 17:17
  • _[…continuation 2…]_ The `(unsigned char)` cast ensures that even if the plain `char` type is a signed type and the character in `found[-1]` or `found[length]` is negative, the correct value is passed to `isalpha()`. All the `isxyz()` functions in [``](http://port70.net/~nsz/c/c11/n1570.html#7.4) has the correct value (an `unsigned char` converted to `int` or EOF). — @PoornimaM — And, just in case it isn't clear, `found[-1]` means "the character before the one that `found` points to", which is unrelated to Python's notation. In C, `a[x] == *(a + x)`, so `found[-1] == *(found - 1)`. – Jonathan Leffler May 11 '20 at 17:21
  • Thank you for the detailed explanation! – Poornima M May 11 '20 at 21:17