scheme-statically-defined data-structures and dynamically-defined data-structures

Question

I try to understand what is the differnt in scheme between statically-defined data-structures and dynamically-defined data-structures.

I know that statically-defined data-structures is created in compile time and dynamically-defined data-structures at runtime but what this mean in practise?

thanks!

score 4 · Answer 1 · edited May 23 '17 at 11:56

Your question is a bit vague and imprecise.

Answering your question requires a clear notion of what is meant by "statically-defined data-structure" versus "dynamically-defined data-structure." After all, in practice most data structure instances are created at runtime (regardless of whether they have been assigned static structural types or are a mishmash of dynamically defined records). So this would seem to contradict your understanding about when such structures are created, at least according to the last statement in your question.

After attempting to infer what you meant, I have decided to provide a comparison between Scheme and certain other well-known languages, since that seems like the most likely area where we might find common understanding of what "statically-defined" is supposed to mean.

Statically Defined Data Structures (with an example in C)

Scheme (as in R5RS) is not particularly renown for static definitions of data structures, at least when compared to other languages like Pascal, C, and C++. (As a strict adherent to lexical scoping and static name resolution, it is in many ways more static than other languages like Perl or Python.)

In languages like Pascal or C, one usually writes out explicit structure or class definitions in isolated declarations. Such explicit definitions statically (i.e., at compile-time) define:

how many bytes of memory are allocated to represent each instance of a structure, and
how members/fields of each instance of a structure are intended to be interpreted when extracted from the instance.

So, in C, a declaration like:

typedef struct coordinate coord_t;
struct coordinate {
   intptr_t x;
   intptr_t y;
   coord_t *next;
};

is a way to tell the compiler that you want the type struct coordinate to denote a block of memory that occupies 3 words: one machine word for each of x and y, and a third machine word to hold a pointer to another coordinate (we'll assume we're going to link these records together in a list). The declaration also indicates that x and y denote signed integers (within the range allowed by the word size), while next denotes a pointer to a coordinate.

(A pointer, i.e. memory address, is quite different from an arbitrary integer, despite puns in the syntax and implementations of some languages that conflate pointers and integers in some contexts.)

Another important detail, is that each identifier is manifestly constrained to the type of value it is allowed to hold. This is true for both the structure fields (x, y, and next above), and also for parameters and local variables, as illustrated in the function definition sum_coords below.

Interpreting each coordinate as a vector in a two-dimensional plane, one might add all the coordinates in a list of coordinates like this:

coord_t sum_coords(coord_t *coords)
{
    coord_t c;
    c.x = 0;
    c.y = 0;
    c.next = NULL;

    while (coords != NULL) {
        c.x += coords->x;
        c.y += coords->y;
        coords = coords->next;
    }

    return c;
}

Note that coords is manifestly constrained to hold only coord_t* and c is manifestly constrained to hold only a coord_t aka struct coordinate. If I were to attempt to violate that constraint without explicitly using a typecast, the compiler is likely to refuse to compile my program, such as in the following:

coord_t sum_coords(coord_t *coords)
{
    coord_t c;
    c = coords; // <-- compiler error: incompatible types in assignment
    ...
 }

So far, so good. But of course, your question was about Scheme.

Dynamically Defined Data Structures (with an example in Scheme)

Typical R5RS Scheme code does not work like above. In R5RS Scheme, one does not write out separate declarations to be interpreted by the scheme compiler/interpreter describing how much memory to associate with each instance of a datatype, or how the bytes in the block of memory for an instance should be interpreted.

Instead of using structure definitions like the above, in R5RS Scheme, to create compound data, one instead allocates pairs, vectors, strings, or procedure objects. (Lists in Scheme are composed from pairs.) In many Scheme implementations, each pair occupies exactly two words, while vectors and strings each have an individual size that is provided at the point they are allocated. (The memory required for an instance of a procedure object is highly dependent on the Scheme runtime and how it represents closures and lexical environments.) The point is, you don't typically even think about the number of machine words being used by a data structure instance; you instead focus first on solving the problem at hand, and put off worrying about byte counts until after you've determined that such effort is necessary.

Anyway, the point is, in R5RS Scheme, one need not tell the compiler/interpreter how a coordinate structure is represented; at least, not in an isolated runtime-parsed declaration added solely for that purpose.

Instead, one might write something like this:

;; A Coordinate is a (list Number Number)

;; A CoordinateList is one of:
;; - '()
;; - (cons c cl), where c is a Coordinate and cl is a CoordinateList

;; sum-coords: CoordinateList -> Coordinate
(define (sum-coords l)
  (cond ((null? l) (list 0 0))
        (else (let ((c (car l))
                    (sum-of-rest (sum-coords (cdr l))))
                (list (+ (list-ref c 0) (list-ref sum-of-rest 0))
                      (+ (list-ref c 1) (list-ref sum-of-rest 1)))))))


 ;; Below is in the REPL:

 > (sum-coords '((1 4) (2 3) (3 2) (4 1)))
 (10 10)

Some differences:

Here, the description of how the data is laid out is in a Scheme comment, not code. Such comments are good Scheme style, since even though the scheme compiler/interpreter does not need such comments, other human beings reading your code almost certainly will need them. (For more on this topic, see the text How to Design Programs.)
One does not need to explicitly declare that the elements of a list representing a Coordinate are numbers; the Scheme runtime will instead carry that information around where ever necessary (so that you can use predicates like number? to see if the value being carried is a number).

Type-safe implementations of Scheme will, when necessary, check in operations like + that the arguments provided are numbers (and not references to pairs, or references to vectors, etc); however, the R5RS report itself says "It is an error for an operation to be presented with an argument that it is not specified to handle"; according to the conventions of the report, such errors are not necessarily detected, and the subsequent behavior is unspecified.

Likewise, the variable c is not manifestly constrained to only hold two element lists. That is, this procedure does not inherently dictate that the value denoted by c must always look like a Coordinate matching our data definition. It is only by programming convention that we will attempt to ensure that such a constraint holds; the programming language has nothing to do with it.
In R5RS Scheme, one does not associate a type with the subcomponents of a pair or vector; any value at all can be referenced by the car or cdr of a pair, or by any of the elements of a vector. Compare this with the C code, where we could only put intptr_t values into the x and y fields of struct coordinate, and could only put struct coordinate* (i.e. pointers to coordinates) in the next field`.

That last point is one crucial difference distinguishing dynamic from static: In Scheme programs like the above, you get significant freedom when working with lists (pairs) and vectors, since you can put any kind of value into them. This freedom can provide much power and flexibility. But you also give up something: You lose the assistance of a compiler telling you when you put the wrong kind of value into a pair or vector, and thus break the assumptions of other code you are relying on.

Consider e.g. this variant of sum-coords:

(define (sum-coords l)
  (cond ((null? l) (list 0 0))
        (else (let* ((c (car l))
                     (s (sum-coords (cdr c))))
                (list (+ (car (car l)) (car c))
                      (+ (car (cdr c)) (car (cdr s))))))))

I have deliberately obfuscated this version. There is a bug; when I run it, I get an error:

> (sum-coords '((1 4) (2 3) (3 2) (4 1)))
Error: cdr: 4 is not a pair.

The bug is a bit hard to find (though if one had followed the design recipe one would never have written the code above). But one could claim that this mistake is easy to make in Scheme, because the language encourages the use of lists and pairs to represent everything, and so there are not always safeguards stopping you from passing a Coordinate (one kind of list) into a spot where a CoordinateList (another kind of list) is expected. (Some dialects of Scheme, such as Typed Racket, provide ways for the user to define datatypes statically so that the compiler can again provide such assistance.)

Of course, in languages like C, the type system is primarily in place in order to tell the compiler how to layout objects in memory. It is quite simple in C to use a typecast to enable passing a struct coordinate* where an intptr_t is expected, and you are unlikely to get a message as nice as

 > (sum-coords '((1 4) (2 3) (3 2) (4 1)))
 Error: +: (4 1) is not a number.

So you should take this so-called "distinction" between the two with a large grain of salt. There is a big difference between "Type Safety" and "Statically Typed". (In short, neither R5RS Scheme nor C is type safe, though several implementations of R5RS Scheme are type safe, or attempt to be.)

Another spot where one might distinguish between dynamic and static: I might have chosen to write the data definition like this:

;; A Coordinate is a (list Number Number Any ...)

;; interpretation: A Coordinate (list x y payload ...) is a point at location <x,y> on a 2D plane, with a metadata payload also associated with the point.

Now, the original implementation for sum-coords still "works" with this definition. The user is free to throw whatever extra information (maybe colors, or labels on the points) as extra elements on the list, and we can still sum up the X and Y components. This is related to the above point that c is not manifestly constrained to only hold two element lists of numbers.

(Of course, one can accomplish much the same goal in languages like C++ via subclassing, and also in C by defining another structure with the same layout in its first three words as the layout struct coordinate, and then type casting when building up the linked list. At that point, in any of these languages, you are effectively moving back towards a more dynamic style of data structure definition; it is just a question of how far along you go on the spectrum between the extremes.)

A notes on the Scheme code and structural abstraction

Even though one does not need to make a separate structure declaration, it can be good style to define a small set of procedures for working with your data type, and use them as an abstract interface that all other operations are intended to utilize. In this case, we might attempt to do this like so:

;; coord : Number Number -> Coordinate
(define (coord x y) (list x y))

;; coord-x : Coordinate -> Number
(define (coord-x c) (list-ref c 0))

;; coord-y : Coordinate -> Number
(define (coord-y c) (list-ref c 1))

This is not perfect (it is a leaky abstraction), but it is a start; of course, one would also be expected to revise sum-coords to use these procedures instead of directly accessing the list representation of Coordinate.

A note on transliteration (or lack thereof) from C to Scheme

Note that the Scheme code is not meant to be an exact word-for-word analogue of the C code.

For example, each coordinate in the C code was made of three machine words; the third word in a coordinate struct carried a link field for the next element. One way to write that more directly in Scheme would be a data definition like

;; A Coordset is one of '(), or (vector Number Number Coordset)

which would lead to a more faithful word-by-word transliteration of the C code written above, but would be less faithful to the spirit of Scheme, where it is more idiomatic to decouple the coordinate structure and the linked structure.

Structure Definitions via Syntax Extension, and in other Scheme Dialects

Even though R5RS Scheme does not have a special form for declaring structures, R5RS does define a macro system where one might define such record syntax.

Also, several Scheme dialects such as Racket and Chez provide linguistic extensions for defining structured data known as records. Even in the latter two cases, records remain a potentially dynamic entity: one can create new record types on the fly, rather than having to provide all desired record types up-front at compile time. But just because something can be done dynamically does not mean it should be, as is explained in the Chez manual:

The procedural interface is more flexible than the syntactic interface, but this flexibility can lead to less readable programs and compromises the compiler's ability to generate efficient code. Programmers should use the syntactic interface whenever it suffices.

Conclusion

There are many differences between how one defines a data-structure in a language like C versus a language like Scheme. In C, you need to say upfront how many entries appear in your data structure, and what their types are; likewise for parameters and local variables. In Scheme, you say as much or as little as you like in your comments, and you write your code accordingly.