How to terminate variable length array when all values are valid?

Question

I'm passing an array of single-precision floating point values to a function in C. The function has no knowledge of the size of the array and I'd like to keep it that way, primarily because while the underlying array is of course fixed-length I won't always be filling it completely so I'd need to be able to find the end anyway. With a string you use a null-terminator, but with this implementation all possible values are potentially valid. Is the best I can do like a "code word" to mark the end using multiple values in order, something like ASCII 'STOP'? That leaves open the possibility of coincidentally having that code word in the array of valid data...

1. Is `NaN` a valid value? 2. why not add the length as the first value in the array itself? — Myst, Sep 23 '19 at 18:27
@Myst Length is usually integer... Why to save (what?) on clarity? — Eugene Sh., Sep 23 '19 at 18:27
FYI, a value that marks the end of an array or other sequence is called a *sentinel*. If any value of the element type could appear in the desired data, then you cannot have an “in-band” sentinel, so you must either use a different data type (such as a struct where one field indicates whether it is a data element or a sentinel, or a bigger type that has more values so you can use one as a sentinel) or indicate the length to the function in some way. — Eric Postpischil, Sep 23 '19 at 18:29
@EugeneSh. - I don't see the issue in using a float for the length. Albeit, the float number must not really have a fractional part. Besides, all practical integer values that make sense won't experience rounding errors. — Myst, Sep 23 '19 at 18:29
@CHollman82 Whatever you do, I'd say definitely *don't* try the multi-value "code word" idea. Besides the (very small, but nonzero) possibility of coincidence, it's also unnecessarily inefficient and confusing. I'd say bite the bullet and pass a separate `int` or `size_t` argument containing the length: that's what virtually all C programs do in this situation. — Steve Summit, Sep 23 '19 at 18:32
@Myst Using a floating-point number to represent an integer quantity is asking for trouble. What if it has 0.000000000000123155 entries instead of 0.0 precisely? This is asking for a whole world of hurt caused by floating point irregularities. Use a `size_t` for "size" things. — tadman, Sep 23 '19 at 18:47
@tadman It's good to be wary of floating-point inaccuracies, and it's usually good to avoid comparing floating-point numbers for exact equality, but don't take it too far, don't be excessively paranoid. If you know you've stored an integer value between 0 and 2^23, even a single-precision `float` is guaranteed to be able to reproduce it with perfect accuracy. It's not going to come out as 0.000000000000123155 or anything. — Steve Summit, Sep 23 '19 at 18:51
@SteveSummit 2^23 used to be a big number, but it's not today. I'm more concerned about someone cutting the array into pieces and instead of getting a clean integer division, ending up with a fraction, and from there the error trouble emerges. It's better to be a bit paranoid than to get burned by floating-point issues that crop up in a variety of cases you might not necessarily expect them to. — tadman, Sep 23 '19 at 18:54
@tadman - just because you think I don't know how floating point numbers are constructed or what type punning is, it doesn't mean I can't store the length of the array in the same memory (bytes) used for the first member of the array ;-) — Myst, Sep 23 '19 at 19:00
@Myst Crack out the `union` if it comes to that, sure. I'm just concerned about too-clever-by-half solutions that satisfy the quirky requirements of the question instead of steering it towards a more traditional and reliable solution. — tadman, Sep 23 '19 at 19:04
@Myst Thanks, but what I mean is it's worth adding a solution involving `union` if you want to explore that angle. — tadman, Sep 23 '19 at 19:09
@tadman - I'd much rather not to push less experienced developers into deeper water (such as `union` / type punning approaches)... and the question obviously marks OP as a less experienced developer (for now). — Myst, Sep 23 '19 at 19:11

score 4 · Answer 1 · edited Sep 23 '19 at 19:53

You'll see array/size pairs being passed around in C a lot, it's really the only way to do this reliably. Even C strings, which are NUL terminated, are often sent with a length parameter to be sure you don't inadvertently walk off the end of the array and into other memory.

This approach also permits you to use substrings, or subsets of the array, instead of being committed to use the whole thing, the problem you're basically trying to solve. Having a terminator is both a blessing and a curse, as anyone who's ever tried to battle a pernicious buffer-overflow bug can attest to.

In your case, the function signature should look like:

void process(float* v, size_t n)

Where v is the array of floating-point values to process and n is how many of them to use. n should be less than or equal to however many valid entries are in the v array.

If you're passing this kind of thing around a lot you may even encapsulate it in a simple struct that defines the data and size. You can then wrap around that some simple allocator/populator tools.

For example:

struct float_array {
  float* values;
  size_t size;
};

Where you can then define something like:

struct float_array* make_float_array(size_t n);
void free_float_array(struct float_array* f);

score 2 · Answer 2 · answered Sep 23 '19 at 18:29

2

You don't need to pass the array maximum length, just the length currently being used for this call along with the pointer.

answered Sep 23 '19 at 18:29

Machinegon

1,855
1
28
45

klutt · Answer 3 · 2019-09-23T19:15:32.110

You can use NAN this way, assuming that's not a valid value for your dataset:

#include <math.h>

float average(float *array)
{
    float sum = 0.0; // Declare this as double for better precision
    size_t index = 0;

    // x == NAN will return false for all x including NAN, so we need
    // the function isnan()
    while(! isnan(array[index])) 
        sum += array[index++];
    return sum/index;
}

Since you're probably want to do this for many functions, I recommend writing a function for calculating length:

size_t farray_length(float *array)
{
    size_t len = 0;
    while(! isnan(array[len])) len++;
    return len;
}

But the usual way of solving these problems in C is to send the size as a separate parameter.

float average(float *array, size_t size) 
{
    float sum = 0.0;
    for(size_t i=0; i<size; i++)
        sum += array[i];
    return sum/size;
}

A third way, which can be useful for instance if you're coding a library with objects you don't want the user to mess with directly, is to declare a struct.

struct float_array {
    float *array;
    size_t size;
}

float average(float_array array) {
    ...

+1 I have thought a similar solution, while passing the length along with array seems me a more logical solution — Amadeus, Sep 23 '19 at 18:54

Myst · Answer 4 · 2019-09-23T19:28:33.627

With a string you use a null-terminator, but with this implementation all possible values are potentially valid.

If all values are valid, a sentinel value cannot be implemented. It's as simple as that (which is why EOF is an integer value that overflows the char type).

The function has no knowledge of the size of the array and I'd like to keep it that way...

Assuming NaN is an invalid value, you could use the isnan() macro to test for a sentinel value.

However, is NaN is a valid value...

I'd need to be able to find the end anyway.

The only option left is to actually pass the array length along with the array.

If you can't add the array length as a separate argument, you could (probably) store the length of the array as the first member - either using a struct (recommended) or using type punning (don't try this at home unless you know what you're doing).

i.e.

typedef struct float_array_s {
  unsigned int len;
  float f[];
};

static unsigned int float_array_len(float_array_s * arr) { return arr->len; }
static float float_array_index(float_array_s * arr, unsigned int index) { return arr->f[index]; }

There's really no reason to use computation cycles if you can simply pass the length of the valid array length along with the array.

Edit (type punning)

I highly recommend avoiding this approach, since type lengths could cause hard to detect bugs. However...

It's possible to store the length of the array in the first float member, by using the same bytes (memory) to store an integer.

Note that this might crash (or worst, silently fail) if unsigned int is longer than float (which it might be, even though they usually have the same size in bytes).

i.e.

#include "math.h"
#include "stdint.h"
#include "stdio.h"

/* Returns the member at `index`. */
static float float_array_index_get(float *arr, unsigned int index) {
  return arr[index + 1];
}
/* Sets the member at `index` to `val. */
static void float_array_index_set(float *arr, unsigned int index, float val) {
  arr[index + 1] = val;
}
/* Returns the array's length. */
static unsigned int float_array_length_get(float *arr) {
  if (sizeof(unsigned int) > sizeof(float)) {
    fprintf(
        stderr,
        "ERROR: (%s:%d) type size overflow, code won't work on this system\n",
        __FILE__, __LINE__);
  }
  union {
    float f;
    unsigned int i;
  } pn;
  pn.f = arr[0];
  return pn.i;
}
/* Sets the array's length. */
static void float_array_length_set(float *arr, unsigned int len) {
  if (sizeof(unsigned int) > sizeof(float)) {
    fprintf(
        stderr,
        "ERROR: (%s:%d) type size overflow, code won't work on this system\n",
        __FILE__, __LINE__);
  }
  union {
    float f;
    unsigned int i;
  } pn;
  pn.i = len;
  arr[0] = pn.f;
}
/* Pushes a member to the array, increasing it's length. */
static void float_array_index_push(float *arr, float val) {
  unsigned int len = float_array_length_get(arr);
  float_array_index_set(arr, len, val);
  float_array_length_set(arr, len + 1);
}
/* Pops a member from the array...
 * ... returning nan if the member was nan or if the array is empty.
 */
static float float_array_index_pop(float *arr) {
  unsigned int len = float_array_length_get(arr);
  if (!len)
    return nan("");
  float_array_length_set(arr, len);
  return float_array_index_get(arr, len);
}

P.S.

I hope you'll stick to the simple func(float * arr, size_t len) now that you see how much extra code you need just to avoid passing the length of the array.

Another possibility not mentioned: there are multiple bit patterns for NaN. You could choose to use one particular representation to mean "NaN", and a different one for the sentinel. — Lee Daniel Crocker, Sep 23 '19 at 19:50

How to terminate variable length array when all values are valid?

4 Answers4

Edit (type punning)

P.S.