0

Related: flexible array member in a nested struct

I am trying to parse some data into a struct. The data contains information organized as follows:

struct unit {

    struct unit_A {
        // 28 bytes each

        // dependency r6scA 1
        char dr6scA1_tagclass[4];
        uint32_t dr6scA1_tagnamepointer;
        uint32_t dr6scA1_tagnamestringlength;
        uint32_t dr6scA1_tagid;

        // 12 bytes of 0x00

    }A;

    // A strings

    struct unit_B {
        // 48 bytes each

        // dependency r6scB 1
        char dr6scB1_tagclass[4];
        uint32_t dr6scB1_tagnamepointer;
        uint32_t dr6scB1_tagnamestringlength;
        uint32_t dr6scB1_tagid;

        // 32 bytes of 0x00

    }B;

    // B strings

    // unit strings

}unit_container;

You can ignore the weird nomenclature.

My line comments // A strings, // B strings and // unit strings each contain null-terminated C strings, the numbers of which coincides with however many unit_A, unit_B, and unit struct entries there are in the data. So like if there are 5 entries of A in unit_container, then there would be 5 C strings in the location where it says // A strings.

Since I cannot use flexible array members at these locations, how should I interpret what are essentially an unknown number of variable-length C strings at these locations in the data?

For example, the data at these locations could be:

"The first entry is here.\0Second entry\0Another!\0Fourth.\0This 5th entry is the bestest entry evah by any reasonable standards.\0"

...which I expect I should interpret as:

char unit_A_strings[]

...but this is not possible. What are my options?

Thank you for your consideration.

EDIT:

I think the most attractive option so far is:

char** unit_A_strings; to point to an array of char strings.

If I do: char unit_A_strings[1]; to define a char array of fixed size of 1 char, then I must abandon sizeof(unit) and such, or hassle with memory allocation sizes, even though it is most accurate to the kind of data present. The same situation occurs if I do char * unit_A_strings[1];.

Another question: What would be the difference between using char *unit_A_strings; and char** unit_A_strings;?

Conclusion:

The main problem is that structs are intended for fixed-size information and what I am needing is a variable-sized information memory region. So I can't legitimately store the data into the struct -- at least not as the struct. This means that any other interpretation would be alright, and it seems to me that char** is the best available option for this struct situation.

Community
  • 1
  • 1
silent
  • 3
  • 2
  • I added a second question, because it looks like `char**` or an illegitimate `char*` are my best options. – silent Jun 12 '16 at 06:55
  • What would be the difference between using char *unit_A_strings; and char** unit_A_strings;? The former is a pointer to (an array of?) `char`s. The latter is a pointer to (an array of?) `char *`s. They're fundamentally different. Whether it is an array or not is up to you - and the normal way of ending the array is to have a final value of 0 / NULL – John Burger Jun 12 '16 at 07:32
  • I appreciate all your help! I would up-vote both of your answers if I could. – silent Jun 12 '16 at 07:39

2 Answers2

0

I think it can using the char** instead (Or you can write some structure to wrapper it). for example, you can write a help function to decode you stream.

char** decodeMyStream(uint_8* stream, unsigned int* numberOfCString)
{
    *numberOfCString = decodeNumberOfCString(stream);
    char** cstrings = malloc((*numberOfCString) * sizeof(char*));
    unsigned int start = 0;
    for (unsigned int i = 0; i < *numberOfCString; ++i)
    {
        usigned int len = calculateIthStringLength(stream, start)
        cstrings[i] = malloc((len) * sizeof(char));
        memcpy(cstrings[i], stream + start, len); 
        start += len
    }
    return cstrings;
}

it just no thinking example code, you can think out more better algorithms.

jstar
  • 26
  • 2
  • What do you think about using a `char *unit_A_strings` to point to the address of the first char and then just using a function to identify the rest of the `char *` strings after that? Is using `char **unit_A_strings` more conducive to using an array of strings? – silent Jun 12 '16 at 07:04
  • I think it is also work, and if you wrapper the function carefully, even can be elegant way. – jstar Jun 12 '16 at 09:08
  • for example: char* s = "aaaa\0bbbb\0"; then you can write code as following: unsigned int len = strlen(s); char* firstStr = s; char* secondStr = s + len; I think you can think carefully about a data structure to save these strings, and a loop how to decode these strings. – jstar Jun 12 '16 at 09:14
0

I think the closest you're going to get is by providing an array of strings:

char *AStrings[] = { "The first entry is here.",
                     "Second entry",
                     "Another!",
                     "Fourth.",
                     "This 5th entry is the bestest entry evah by any reasonable standards.",
                     NULL
                   };

Note two things:

  1. AStrings is an array of pointers-to-strings - it will be 6 (see 2. below) consecutive pointers that point to the actual strings, NOT the 'compound' string you used in your example.
  2. I ended AStrings with a NULL pointer, to resolve the "when do I finish?" question.

So you can "fall off the end" of A and start looking at locations as pointers - but be careful! The compiler may put in all sorts of padding between one variable and the next, mucking up any assumptions about where they are relative to each other in memory - including reordering them!

Edit Oh! I just had a thought. Another data representation that may help is essentially what you did. I've 'prettied' it up a bit:

char AString[] = "The first entry is here.\0"
                 "Second entry\0"
                 "Another!\0"
                 "Fourth.\0"
                 "This 5th entry is the bestest entry evah by any reasonable standards.\0";
  • The C compiler will automatically concatenate two 'adjacent' strings as though they were one string - with no NUL character between them. I put them in specifically above.
  • The C compiler will automatically put a '\0' at the end of any string - at the semicolon (;) in the above example. That means that the string actually ends with two NUL characters, not one.

You can use that fact to keep track of where you are while parsing the string 'array' - assuming that every desired value has a (sub)string of more than zero length! As soon as you encounter a zero-length (sub)string, you know you've reached the end of the string 'array'.

I call these kind of strings ASCIIZZ strings (ASCIIZ strings with a second NUL at the end of all of them).

John Burger
  • 3,662
  • 1
  • 13
  • 23
  • I am wondering what to put in the struct at the line comments `A strings`, `B strings` and `unit strings`. So what you are saying is that I could use char[] at these locations. The data is already a given: multiple null-terminated strings one after the other. I don't get to define the strings myself, rather, I am simply trying to parse them into a struct. – silent Jun 12 '16 at 06:25
  • And the problem is that I cannot use char[] as a flexible array member, I would have to do something like `char AString[1];`, but that is not true to the size of the data. – silent Jun 12 '16 at 06:31
  • It's a pity that you cannot use FAMs - but also a relief. They're messy! Yes, I am saying that you should be able to put the `char[]` directly after the `struct` instance, and hope that the compiler will treat it 'normally'. However you'll have to do the parsing yourself at runtime - or use the earlier array-of-char[] I gave – John Burger Jun 12 '16 at 06:34
  • What about something like `char *AString;` and then just pointing to the first string? That way, if there are no strings at all, I don't have a difficulty. – silent Jun 12 '16 at 06:59
  • Now I'm getting confused between the two possible solutions! You could have a single member in your struct that either pointed to the ASCIIZZ string with a `char *`, or pointed to the string array with a `char * *` - both would work, and the latter would be easier. – John Burger Jun 12 '16 at 07:27