0

During the development stages of my compiler I ran into a pretty complex problem: how to store weakly typed variables in my language.

Since I allow variables to be declared without explicitly specifying their type, and allow functions to return either type (e.g. function can return a scalar OR array), I am now facing the difficulty of what form to store these variables in.

Here are the possibilities I've concidered, but all of them have a significant overhead:

  • Regard all variables as lists of doubles (List<double>) and have the first element specify whether it's a scalar or array (0 or 1 for instance).
  • Regard all variables as object instances.
  • Regard all variables as a TVar (custom class), which can be either a double or List<double>.

To keep in mind:

  • The only two types of variables I intend to have are doubles and double arrays, since all others can be derived from such (e.g. char is a case of a double, string is an array of chars, e.t.c.)
  • I am using ILAsm which is a higher-level flavour of assembly (.NET intermediate language basically)
Samuel Allan
  • 117
  • 6
  • The list-of-doubles approach won't allow you to represent lists of lists. – sepp2k Jul 10 '16 at 20:32
  • @sepp2k A list of lists can be thought of as a multi-dimensional list, which I am planning to support with values of the first index larger than 2 (e.g. 3 -> 3 dimensional array/list), e.t.c. – Samuel Allan Jul 10 '16 at 20:46
  • Floats are not precise for all integers. JavaScript has this problem. All numbers there are floats. – usr Jul 10 '16 at 21:54
  • @usr true, I am thinking about how to maybe minimize this impact by 'optimizing' obvious integers (such as for loop counters) to be native `int` types – Samuel Allan Jul 11 '16 at 13:50

1 Answers1

1

This obviously depends a lot on your language. If you don't fix variable types at compile time, then you need to wrap all values with type information. (This is sometimes referred to as "boxing" the variable, although it's not the only thing that "boxing" can mean.)

On the other hand, you might be able to deduce the variable type at compile time. For example, awk (which, despite its complete lack of declaration syntax, is sometimes implemented with a compiler to some kind of virtual machine) allows both scalar and array variables, but it is quite possible to figure out the type of each awk variable:

  1. Aside from being passed as function arguments, an array variable cannot be used without a subscript, because awk does not allow array assignment. So any variable used with subscripts must be an array, and any variable used without subscripts, except in the call to a function, must be a scalar.

  2. Functions don't have prototypes either, but all useful parameters must be either used in the function body or passed to another function. So it is possible to create a prototype for every function, identifying each variable as scalar/array/unknown.

  3. A least fixed-point repetitive scan over function calls will then provide precise information about every useful variable. If a variable is used both as a scalar and as an array, then an error can be thrown. If a variable is not used at all (except for possibly being passed to functions which don't use the corresponding parameter), then the variable could be simply eliminated, or it could be compiled as an (unused) scalar.

That's not enough to fully type awk variables, as there are three scalar types, so boxing is still needed in most cases. In some cases, it is probably possible to deduce scalar types as well, although it will be trickier because of automatic coercions. However, your language only has a single scalar type, so a strategy similar to the above might be workable.

rici
  • 234,347
  • 28
  • 237
  • 341