Function that finds a position of a substring s2 in s1

Question

Here is the code:

int position(char *s1, char *s2) {
    int i, j;
    for (i = 0; s1[i]; i++) {
        for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);
        if (!s2[j]) return i;
    }
    return -1;
}

int main() {
    char word1[101], word2[101];
    int p;
    printf("Type two words: ");
    scanf("%s %s", word1, word2);
    p = position(word1, word2);
    if (p < 0) 
        printf("Word'%s' does not exists in the sentence '%s'.", word2, word1);
    else
        printf("Position of the word '%s' is %d.", word2, p);
    return 0;
}

How does the second for loop work?

Does the function return i if it detects the word, if so how?

Do you see what is returned in this statement if (!s2[j]) return i;? If you see then why are you asking the question?! — Vlad from Moscow, Apr 29 '22 at 19:14

chqrlie · Answer 1 · 2022-04-30T17:29:42.970

The loop for (j = 0; s2[j] && s2[j] == s1[i + j]; j++); has an empty body ; which can also be written:

for (j = 0; s2[j] && s2[j] == s1[i + j]; j++) {
    /* empty */
}

or

for (j = 0; s2[j] && s2[j] == s1[i + j]; j++)
    continue;

It computes the length of the initial substring of s2 that matches characters at offsets i and subsequent of s1. At the end of the loop, j is the number of matching characters up to but not including the null terminator.

It this initial substring is the full string s2, which can be tested by comparing s2[j] to the null terminator '\0', we have a match at position i, hence if (!s2[j]) return i;

Note that this function returns 0 for an empty substring s2, except if s1 is also empty, which is somewhat inconsistent. It should either return 0 in all cases:

int position(const char *s1, const char *s2) {
    int i, j;
    for (i = 0;; i++) {
        for (j = 0; s2[j] && s2[j] == s1[i + j]; j++)
            continue;
        if (!s2[j]) return i;
        if (!s1[i]) return -1;
    }
}

Note also that this function may have undefined behavior if s1 is longer than INT_MAX, which is possible on 64-bit systems where int has 32 bits and pointers and object sizes have 64 bits. It would be safer to change the int variable and return types to ptrdiff_t defined in <stddef.h>, albeit not full sufficient.

The standard function strstr does not have these shortcomings as it is defined as returning a pointer to the match:

char *strstr(const char *s1, const char *s2);

Note however that in C, it returns a non const pointer even if passed a const pointer, potentially breaking const correctness.

Here is a simplistic implementation using the same algorithm:

#include <string.h>

char *strstr(const char *s1, const char *s2) {
    size_t i, j;
    for (i = 0;; i++) {
        for (j = 0; s2[j] && s2[j] == s1[i + j]; j++)
            continue;
        if (!s2[j]) return (char *)&s1[i];
        if (!s1[i]) return NULL;
    }
}

Good C libraries use more sophisticated algorithms, which I encourage you to search and study.

Many compilers give you a waning if the loop body is just a semicolon after the loop head - exactly because the reader and writer both can miss it, completely misunderstanding what the code does. — gnasher729, Apr 30 '22 at 18:31
Intel added a special instruction to their processors that can compare up to 16 x 16 bytes in one instruction. I would love to know if any standard library uses it. — gnasher729, Apr 30 '22 at 18:33
@gnasher729: such specialized instructions are not easy to use for generic problems such as `strstr`, but are useful for specialized tasks such as image and digital signal processing, AI and crypto algorithms. — chqrlie, Apr 30 '22 at 22:44

Function that finds a position of a substring s2 in s1

1 Answers1