3

I was debugging an error and found out that undefined had been appended to a list, which caused a crash later on.

I expected that appending something other than a list with the ++ operator would cause a crash. But this is not true for undefined. Here is an example:

1> [1,2,3] ++ undefined.
[1,2,3|undefined]

Although it does not crash, the list is not fully functional anymore:

1> L = [1,2,3] ++ undefined.
[1,2,3|undefined]
2> L ++ [4].
** exception error: bad argument
     in operator  ++/2
        called as [1,2,3|undefined] ++ [4]

Why does this happen? Is this related to the underlying implementation of lists in erlang?

Isac
  • 2,058
  • 16
  • 23
  • So discussed this with a few fellows in #erlang@freenode. This is an example of an "improper list". It has a few benefits. Consider a tuple: `{key, value}`. Inside the erlang VM, this is three terms. One term is a two-element tuple, the other two are the terms `key` and `value`. When accessing a tuple, you must first dereference the tuple term to access the two terms it contains. Suppose, however, that we implemented it as an improper list: `[key|value]`. With this format you don't have to dereference a term and you save a word of memory (since you don't have to store a tuple term as well). – Soup d'Campbells Feb 19 '13 at 20:03
  • This works because Erlang lists are CONS lists, meaning each cell of the list is a two-word pair, equivalent to `struct list_cell {Eterm hd; Eterm tl;}`. `[a|b]` creates a list_cell with hd set to the atom `a` and tl set to the atom `b`. It should also be noted that iolists could potentially (and legally/sanely) be improper lists. – Soup d'Campbells Feb 19 '13 at 20:12
  • 1
    Sounds correct. Here are some links that explains improper lists and its uses: http://stackoverflow.com/questions/1919097/functional-programming-what-is-an-improper-list http://stackoverflow.com/questions/5088575/practical-use-of-improper-lists-in-erlang-perhaps-all-functional-languages – Isac Feb 19 '13 at 20:14

2 Answers2

1

In Erlang, all terms are represented by a compact pointer-like value called Eterm. It appears that list manipulation functions are implemented as type-agnostic.

Consider from this perspective: inside the erlang VM all Eterms are equal. The head and tail list manipulation operations are billed as being very fast. Since it takes several operations to evaluate the opaque Eterm type to determine whether or not it is a list, why bother?

The expected outcome in such a situation is an error, and you do get one. Eventually.

There's something to be said for trusting the programmer, and when dealing with an operation that adds several cycles and is used frequently, the potential benefit of ignoring a bad append stacks up, and the only penalty is a strange error.

Soup d'Campbells
  • 2,333
  • 15
  • 14
  • I know this. My question is: why `[1,2,3] ++ undefined` does NOT crash? – Isac Feb 19 '13 at 16:54
  • Yeah, sorry, misread your question. Updated my answer with my best guess, but I don't know the exact answer. – Soup d'Campbells Feb 19 '13 at 16:56
  • @Isac: Want to know something funny? It appears to happen with any datatype. I tried it with a different atom, an integer, and a float. None complained about the operation, all failed to read the list later. – Soup d'Campbells Feb 19 '13 at 17:07
  • It's odd. I updated my answer to my best guess; I think it's reasonably likely. – Soup d'Campbells Feb 19 '13 at 18:47
  • I've changed to correct answer to rvirding's since it is more complete and didactic. Sorry Soup. And thanks for all the help! – Isac Feb 20 '13 at 13:59
1

The reason is the ++ appends it second argument to the end of its first argument which must be a list. It does not do any processing of its second argument, it just appends it as is. So:

1> [1,2,3] ++ undefined.
[1,2,3|undefined]
2> [1,2,3] ++ [undefined].
[1,2,3,undefined]

The reason you can do this as well as:

3> [a|b].
[a|b]
4> [a|[b]].
[a,b]

is that a list is a sequence list cells, a singly linked list, not a single data structure as such. If the right-hand-sides, called the tail, of each cell is another list cell or [] then you get a proper list. The left-hand-side of each cell is called the head and usually contains the elements of the list. This is what we have in 2 and 4 above. Most, if not all, library functions assume that lists are proper lists and will generate an error if they are not. Note that you have to actually step down the whole list to the end to see if it is proper or not.

Each list cell is written as [Head|Tail] and the syntax [a,b,c] is just syntactic sugar for [a|[b|[c|[]]]]. Note that the tail of each list cell is a list or [] so this is a proper list.

There are no restrictions as to what the types the head and tail of a list cell can be. The system never checks it just does it. This what we have in 1 and 3 above where the tail of the last list cell (only list cell in 3) is not a list or [].

Sorry getting a bit over-didactic here.

EDIT: I see that I have already described this here: Functional Programming: what is an "improper list"?

Community
  • 1
  • 1
rvirding
  • 20,848
  • 2
  • 37
  • 56