3

Edit: Turns out the compiler I'm using doesn't support variable length arrays so I have no way of achieving the notation I desire using MSVC


I have a function which takes in an array of strings and a query string, and returns the index of the string in the array that matches the query.

int findStringIndex(char query[], int strLength, char* strArray, int numStrings) {
    for (int i = 0; i < numStrings; i++) {
        for (int j = 0; j < strLength; j++) {

            // Skip to next word if there is a mismatch
            if (query[j] != *(strArray+ (i * strLength) + j))
                break;

            if (query[j] == '\0' && *(strArray + (i * strLength) + j) == '\0')
                return i;
        }
    }
    return -1;
}

Notably, both the length of the string and the size of the array vary, since I am using this function in several different places with differently sized strings. Currently, this approach has two problems:

  • Ugly array access notation *(strArray+ (i * strLength) + j)) rather than something like strArray[i][j]
  • When I call the function and pass the array of strings as the third argument, I get the warning that the argument I pass "differs in levels of indirection" from char*

Is there a way for me to tell the compiler to accept a variable as the size of one of the array's axes so that I can use the notation strArray[i][j]?

Also, how should I define the function so that I don't get the "levels of indirection" warning?

Edit: As a clarification, the string arrays are not ragged. They have constant sized dimensions, but different arrays that I want to use the function on have different sizes. The code runs fine and achieves the desired behavior in its current state, I just want to make sure I'm writing things the right way

Here are two examples (different string sizes) of arrays I might use with this function:

char instructionStrings[NUM_INSTRUCTIONS][INST_MAX_CHARS] = {
    "nop", "lit", "litn", "copy", "copyl", "asni", /* etc */
};

char typeStrings[NUM_TYPES][TYPE_MAX_CHARS] = {
    "null", "int8", "int16", "int32", "int", "real32", "real"
};

Where INST_MAX_CHARS and TYPE_MAX_CHARS are different values. Then I would call the function like findStringIndex(userInput, TYPE_MAX_CHARS, typeStrings, NUM_TYPES); for the second example

eejakobowski
  • 105
  • 5
  • Are you going to compare strings stored in arrays or character arrays of a fixed size? At least show arrays that you are going to process. – Vlad from Moscow Jun 30 '21 at 21:28
  • Why don't you create a macro that returns what you want and call it as something like `STR_ARRAY(size, i, j)`? – Jardel Lucca Jun 30 '21 at 21:34
  • 1
    Please show the code with the call site. The mention of "differs in levels of indirection" suggests you are passing something incompatible with `char *` so the code is already wrong – M.M Jun 30 '21 at 21:40
  • An example of a string array I pass is `char instructionStrings[NUM_INSTRUCTIONS][INST_MAX_CHARS] = { "nop", "lit", "litn", "copy", "copyl",` etc – eejakobowski Jun 30 '21 at 21:45
  • @M.M is correct. When you pass `typeStrings` in as your 3rd parameter, the type of the argument is `char (*)[TYPE_MAX_CHARS]`, which is not compatible with `char *` as you have spec'd in your implementation. Most would use `void *` for the function argument type, and covert to `char *` inside the function. – jxh Jun 30 '21 at 22:57
  • Yeah I ended up just going with `void*` to get rid of the warnings. My reasoning for `char*` was to try to somewhat hint at the desired input type, but seeing as it's not possible with the compiler I'm using to show exactly what the argument should be, `void*` will have to do – eejakobowski Jul 01 '21 at 03:19

3 Answers3

4

If your compiler supports variable length arrays then the function can be declared and defined the following way as it is shown in the demonstrative program below. Note that not all compilers support variable length arrays (notably MSVC), in which case there is no way to get the desired notation.

#include <stdio.h>
#include <string.h>

size_t findStringIndex( size_t m, size_t n, char a[m][n], const char *s ) 
{
    size_t i = 0;

    while ( i < m && !( strcmp( a[i], s ) == 0 ) ) ++i;
    
    return i;
}

int main(void) 
{
    enum { M1 = 3, N1 = 10 };
    
    char a1[M1][N1] =
    {
        "Hello", "World", "Everybody"
    };
    
    const char *s = "Hello";
    
    size_t pos = findStringIndex( M1, N1, a1, s );
    
    if ( pos != M1 )
    {
        printf( "\"%s\" is found at position %zu.\n", s, pos );
    }
    else
    {
        printf( "\"%s\" is not found.\n", s );
    }
    
    s = "World";
    
    pos = findStringIndex( M1, N1, a1, s );
    
    if ( pos != M1 )
    {
        printf( "\"%s\" is found at position %zu.\n", s, pos );
    }
    else
    {
        printf( "\"%s\" is not found.\n", s );
    }
    
    s = "Everybody";
    
    pos = findStringIndex( M1, N1, a1, s );
    
    if ( pos != M1 )
    {
        printf( "\"%s\" is found at position %zu.\n", s, pos );
    }
    else
    {
        printf( "\"%s\" is not found.\n", s );
    }
    
    s = "Bye";
    
    pos = findStringIndex( M1, N1, a1, s );
    
    if ( pos != M1 )
    {
        printf( "\"%s\" is found at position %zu.\n", s, pos );
    }
    else
    {
        printf( "\"%s\" is not found.\n", s );
    }
    
    return 0;
}

The program output is

"Hello" is found at position 0.
"World" is found at position 1.
"Everybody" is found at position 2.
"Bye" is not found.
eejakobowski
  • 105
  • 5
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • That `findStringIndex` is find the row. It returns the row in which the string is. And each row is one string. I assume ok here. – Chef Gladiator Jul 01 '21 at 11:52
1
  1. Use the correct type for sizes : size_t

  2. You can use "normal" indexes by using pointers to arrays.

int findStringIndex(char query[], size_t strLength, char (*strArray)[strLength], size_t numStrings) {
    for (size_t i = 0; i < numStrings; i++) {
        for (size_t j = 0; j < strLength; j++) {

            // Skip to next word if there is a mismatch
            if (query[j] != strArray[i][j])
                break;
    /* ..... */

I assume that you pass the 2D char array (not array of pointers)

0___________
  • 60,014
  • 4
  • 34
  • 74
  • At the part where I put `char(*strArray)[strLength]` in the function header, I'm getting an error that the compiler "expected a constant expression". In particular, visual studio puts a red underline under the `strLength` part – eejakobowski Jun 30 '21 at 21:48
  • 2
    MS non conforming compiler. You are out of luck. They still did not implement features introduced 25y ago :) – 0___________ Jun 30 '21 at 22:23
  • @eejakobowski download eclipse CDT and enjoy modern gcc compiler – 0___________ Jun 30 '21 at 22:26
  • 1
    [VLA has been an optional feature since C.2011.](https://softwareengineering.stackexchange.com/questions/314838/why-were-variable-length-arrays-made-optional-in-c-2011) – jxh Jun 30 '21 at 22:35
  • @0___________ I'm willing to do a lot of things for the sake of my project, but touching eclipse isn't one of them :P I'll see where I go but I guess writing more portable code can never be a bad thing – eejakobowski Jul 01 '21 at 03:24
  • @eejakobowski, I *like* Eclipse, including for C. But you don't need Eclipse to use GCC on Windows. – John Bollinger Jul 01 '21 at 14:39
  • @JohnBollinger of course you do not need it. But OP is a VS IDE user and probably the easiest way for him to get the both IDE and toolchain, is to install the Eclipse CDT. – 0___________ Jul 01 '21 at 14:44
  • @0___________, the OP comment I responded to explicitly said that they refused to consider Eclipse, so that's moot. – John Bollinger Jul 01 '21 at 14:53
  • @eejakobowski , clang 11 is coming with Visual Studio 2019. It is called clang-cl. After you start through vcvars64.bat or whatever is VS shortcut called it will be on the path. cl.exe does not and probably will not support VLA or VMT. And using VS IDE it is even easier to use clang. If you need further help please ask. – Chef Gladiator Jul 01 '21 at 17:30
1

Other answers have covered how you could get the array access syntax you want using variable length arrays (VLA).

If you are working on systems without VLA support, you probably need to continue to use the implementation close to what you have already shown.

However, there are a few workarounds.

Workaround 1: Make all the string sizes the same

If you are using this function on typically small strings. , then instead of allowing INST_MAX_CHARS and TYPE_MAX_CHARS to be different values, stipulate that all arrays passed to this function have to have the same value for the second dimension. So, in practice, it would be the max of all the string lens over instruction strings and type strings. (You may have to implement your own MAX macro.)

#define X_MAX_CHARS MAX(INST_MAX_CHARS, TYPE_MAX_CHARS)

char instructionStrings[NUM_INSTRUCTIONS][X_MAX_CHARS] = {
    "nop", "lit", "litn", "copy", "copyl", "asni", /* etc */
};

char typeStrings[NUM_TYPES][X_MAX_CHARS] = {
    "null", "int8", "int16", "int32", "int", "real32", "real"
};

Then, your function could look like:

int findStringIndex(char query[], char (* strArray)[X_MAX_CHARS], int numStrings) {
    for (int i = 0; i < numStrings; i++) {
        if (strcmp(query, strArray[i]) == 0) return i;
    }
    return -1;
}

Workaround 2: Use _Generic

Since C 2011, C has defined a type selection mechanism call _Generic. Clang and GCC have supported it since the feature was introduced, and so do recent versions of MSVC. Obviously, if you are not using at least version Visual Studio 2019 version 16.8 Preview 3, then you will not have this feature.

Using _Generic, you could detect the size of the second dimension, and call a distinct function to use it.

#define findStringIndex(Q, A, N) \
        _Generic((A), \
            const char (*)[INST_MAX_CHARS] : findStringIndex_I(Q, A, N), \
            const char (*)[TYPE_MAX_CHARS] : findStringIndex_T(Q, A, N), \
            default                        : -1)

Where, findStringIndex_I and findStringIndex_T are each defined to handle arrays of strings that they know how to support. You could create a macro to automate the creation of the function, in case you need to add many such functions.

#define DEFINE_FIND_STRING_INDEX(SUFFIX, STRING_SZ) \
        int findStringIndex_ ## SUFFIX ( \
                char query[], \
                char (* strArray)[STRING_SZ], \
                int numStrings) { \
            for (int i = 0; i < numStrings; i++) { \
                if (strcmp(query, strArray[i]) == 0) return i; \
            } \
            return -1; \
        }

DEFINE_FIND_STRING_INDEX(I, INST_MAX_CHARS)
DEFINE_FIND_STRING_INDEX(T, TYPE_MAX_CHARS)

Workaround 3:

While not as generic as _Generic, since you are only dealing with the size of strings, you could accomplish the same thing using conditional expressions. By selecting on the size of the first element of the provided array, which essentially gives you the size of the second dimension, you can determine the appropriate function to call.

#define findStringIndex(Q, A, N) \
        ((sizeof((A)[0]) == INST_MAX_CHARS) ? findStringIndex_I(Q, A, N) : \
        ((sizeof((A)[0]) == TYPE_MAX_CHARS) ? findStringIndex_T(Q, A, N) : \
        -1))

As with _Generic, the individual functions to be called are implemented separately.

jxh
  • 69,070
  • 8
  • 110
  • 193