3

I am given this C code (the details of the code, including possible bugs, are not very relevant):

int read_leb128(char **ptr, char *end) {
  int r = 0;
  int s = 0;
  char b;
  do {
    if ((intptr_t)*ptr >= (intptr_t)end) (exit(1));
    b = *(*ptr)++;
    r += (b & (char)0x7f) << s;
    s += 7;
  } while (b & (char)0x80);
  return r;
}

and I want to throw some formal methods at it to rule out dangerous bugs.

In particular, I would like a assurance that this function does not modify any value besides *ptr and only reads memory from *ptr to end (not inclusive).

It looked like Frama-C is a good framework for such verification, so I started to add annotations:

/*@
  requires \valid(ptr);
  requires \valid_read((*ptr) + (0 .. (end-*ptr)));
  assigns *ptr;
 */

It seems that the Frama-C plugin that checks for invalid memory access is Eva, but running it on these files still prints:

[eva:alarm] foo.c:33: Warning: 
  out of bounds read. assert \valid_read(tmp);
                      (tmp from *ptr++)

Am I just expecting too much of the tool, or is there a way for Frama-C to verify this?

This is Frama-C 19.0 (Potassium).

Joachim Breitner
  • 25,395
  • 6
  • 78
  • 139

1 Answers1

2

You're on the good track, but an ACSL contract is often not the best way to explain to Eva what the initial state of the analysis should be. Usually, you use a wrapper function for that (see section 6.3 of the Eva manual. In your case, you could for instance go with the following code:

#include <stdint.h>
#include <stdlib.h>

/*@
  requires \valid(ptr);
  requires \valid_read((*ptr) + (0 .. (end-*ptr)));
  assigns *ptr;
 */
int read_leb128(char **ptr, char *end) {
  int r = 0;
  int s = 0;
  char b;
  do {
    if ((intptr_t)*ptr >= (intptr_t)end) (exit(1));
    b = *(*ptr)++;
    r += (b & (char)0x7f) << s;
    s += 7;
  } while (b & (char)0x80);
  return r;
}

#define N 4

char test[N];

int main() {
char* beg = &test[0];
char* end = &test[0] + (N-1);
read_leb128(&beg, end);
}

Now, as you seem to be interested by the outputs (locations that are assigned) and the inputs (locations whose initial value is read), you need to activate some option from the Inout plugin (see chapter 7 of the Eva manual):

frama-c -eva -eva-slevel 20 res.c -lib-entry -out-external -input

will give you:

...
[inout] Out (external) for function read_leb128:
    beg
[inout] Inputs for function read_leb128:
    test[0..2]; beg

which indeed indicates that only beg (whose address is passed to read_leb128) is modified, and that it gets its values from test[0 .. 2], the content of the array, and itself (since you increment it, its final value obviously depend on its initial value).

Virgile
  • 9,724
  • 18
  • 42
  • Thanks, great! Can I somehow encode the expected analysis result in the file, so that I can easily integrate the check with my test suite? (I.e. so that I don't have to grep through the textual output of the report?) – Joachim Breitner Jul 20 '19 at 07:22
  • Unfortunately, nothing is readily available, though a small OCaml script should be able to convince Frama-C to give the information in a more suitable form. This is the route that was taken by Airbus and Atos as described in this article: https://www.di.ens.fr/~delmas/papers/erts12.pdf – Virgile Jul 20 '19 at 07:57
  • BTW, the way this is set up (with `#define N 4`) the check would not alert me if the the code would access, say, the second byte, without checking if that is past the buffer. Is there a way to have a buffer of unspecified length and content? – Joachim Breitner Jul 23 '19 at 07:44
  • I'd say that this warrants a whole new question, but the short answer is, for all practical purposes, no. Basically, if `N` is a variable whose abstract value does not reduce to a singleton, there aren't any domains in Eva that keep track of the relation between `N` and the length of the `test` array. Hence, any access will be potentially invalid, and you will end up with a haystack of (hopefully) false alarms among which might (or not) lie hidden a needle (i.e. true alarm). You can abstract over content, though (in fact this is what is done there through `-lib-entry`). – Virgile Jul 23 '19 at 08:36
  • You are absolutely right about a new question for this: https://stackoverflow.com/q/57168439/946226. Maybe you can elaborate there? – Joachim Breitner Jul 23 '19 at 16:22