1

When parsing a file I need to detect whether an item with minimum and maximum occurrence of 1 has been processed already. Later on in validation I need to detect if it was not processed at all.

I can do this inelegantly with a count variable that increments each time but it is cumbersome and inelegant. Perhaps a boolean flag. In general I would use some form of a Sentinel Value, such as NULL for a pointer, or "" for a statically allocated string array. Or memset() zero for many items.

The problem is if the full range of the datatype is potentially valid input it gets very sticky trying to make a Sentinel.

If it is signed and only positive numbers are used, the Sentinel can be any negative number. If the data type is unsigned but values that would use the sign bit are not in use, then a negative number can be used.

If a larger data type can be used to store the value, the added range can be used for the SV. Although this may affect type-compatibility, truncation, promotion.

In an enum I can add an entry, creating an SV.

It gets difficult to keep track of all the ways of showing for each member of a structure whether it was initialized or not.

I almost forgot - an easy and universal way could be to make every variable dynamically allocated and initialized to NULL, even integers. Though a bit strange and slightly wasteful of memory perhaps, this would be highly consistent and would also allow boolean logic of conditional statements to work, eg:

if(age) print("Age is a valid variable with value: %d", *age);

Edit to clarify the question (no changes above):

I am parsing logs from another application (no documentation on the format) The log entries include data structures/objects and the files also have slight spontaneously corrupt entries because another thread occasionally writes to them without synchronizing access.

The structures have members of any base type, eg integer, string, sub-structure, in different quantities, eg 1, 0-1, 1 - N. It gets more complicated if you add the rules on valid combinations and valid sequences.

It might be easiest for me to define everything as an array with an associated counter variable.

I was motivated to ask about this because managing the initialization and checking if a variable has been read in already starts to get overwhelming.

The next stage - input validation - is even more difficult.

user10530562
  • 149
  • 2
  • 8
  • Is this actually a question? It sounds iike you're just thinking out loud of various ways to accomplish the task. Assess the trade-offs of the various approaches and decide which best fits your use case. – Jason White Jun 20 '19 at 15:15
  • Yeah, the question is what's the Best way to do it. There may not be a better way or the best way might depend entirely on circumstances. I'm willing to bet someone with more experience and greater skill has dealt with the issue many times and Might have some insights on it beyond what I've described. – user10530562 Jun 20 '19 at 16:11
  • Are you asking how to determine when to stop an input sequence with a sentinel value, like in "Enter some positive numbers, -1 to stop." or are you asking how to input an unknow number of values, somehow stop the sequence and only later check if a particular value is present in the obtained sequence exactly once? – Bob__ Jun 20 '19 at 18:47
  • @Bob__ Basically, as a matter of good programming practice and simple design, how would you check if a variable you are writing to has already been initialized? You can use a Sentinel value if you can reserve an entry in the number space of the variable. Otherwise another variable would have to track it. – user10530562 Jun 20 '19 at 19:12
  • Still too broad IMHO, but AFAIK it's considered good practice to *always* initialize a variable before its use. When you need to extract a value from a stream, you usually pass it as a pointer and *check* the return value (an extra variable) of the library function used to see if the read was successful and that value can be used. If you want to reserve some memory for an object, you use the pointer returned by, say, malloc and *check* if it's NULL or it's valid (so, here NULL is the sentinel value). You can also imagine a function which returns a struct with a value and a bool, and so on. – Bob__ Jun 20 '19 at 19:38
  • If you're saving the items for later processing, either you keep them in a dynamically allocated array, in which case you must know how many there are, or you keep them in a linked list, in which case the `next` pointer acts as a sentinel. If that doesn't apply to your question, I'm not understanding what you're looking for. – rici Jun 20 '19 at 22:20

1 Answers1

0

The problem is if the full range of the datatype is potentially valid input it gets very sticky trying to make a Sentinel.<

I would say that if that is the situation, there is no way to make a sentinel. You might get lucky if the data type in question has a trap representation (which essentially means that there are some bit patterns that you can store in the data type, but which are not interpretable as a value in the data type), which you could (ab)use.

Other than that, I think you need to resort to some secondary way (variable) to achieve your goal.

As a side note: Sometimes it is practical (but not safe) to reason about what values might be valid, but extremely unlikely input. You might use such a "special" value as a sentinel, but would have to provide some functionality to determine if, when encountering such a "special" value, it truly is a sentinel or a valid input.

Think of an array of doubles: You could use the value of PI to 30 significant digits, if it is highly unlikely that you would ever encounter that number as a valid input, let's say in an accounting software. But you would still need some handler for the sentinel value to determine if it truly is a sentinel, or, indeed, valid but improbable.

GermanNerd
  • 643
  • 5
  • 12