How does one avoid creating an ad-hoc type system in dynamically typed languages?

Question

In every project I've started in languages without type systems, I eventually begin to invent a runtime type system. Maybe the term "type system" is too strong; at the very least, I create a set of type/value-range validators when I'm working with complex data types, and then I feel the need to be paranoid about where data types can be created and modified.

I hadn't thought twice about it until now. As an independent developer, my methods have been working in practice on a number of small projects, and there's no reason they'd stop working now.

Nonetheless, this must be wrong. I feel as if I'm not using dynamically-typed languages "correctly". If I must invent a type system and enforce it myself, I may as well use a language that has types to begin with.

So, my questions are:

Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?

Here is a concrete example for you to consider. I'm working with datetimes and timezones in erlang (a dynamic, strongly typed language). This is a common datatype I work with:

{{Y,M,D},{tztime, {time, HH,MM,SS}, Flag}}

... where {Y,M,D} is a tuple representing a valid date (all entries are integers), tztime and time are atoms, HH,MM,SS are integers representing a sane 24-hr time, and Flag is one of the atoms u,d,z,s,w.

This datatype is commonly parsed from input, so to ensure valid input and a correct parser, the values need to be checked for type correctness, and for valid ranges. Later on, instances of this datatype are compared to each other, making the type of their values all the more important, since all terms compare. From the erlang reference manual

number < atom < reference < fun < port < pid < tuple < list < bit string

having gone from a lot of java, to a lot of groovy, you solve the problem with unit tests, and just accept the fact that you don't know until runtime the true type of an object. In fact the true type of an object doesn't matter if you're duck typing. — dstarh, Dec 16 '10 at 02:29
You seem to be conflating dynamically typed and weakly typed. There is a distinction between strongly typed versus weakly typed and statically typed versus dynamically typed. — Laurence Gonsalves, Dec 16 '10 at 02:35
I'd be interested in seeing an example of the sort of code that led to this question. — Laurence Gonsalves, Dec 16 '10 at 02:36
I've added an example, and I've looked up a few definitions. I'm not sure where there's a loss of distinction between "dynamic typing" and "weak typing". Please help clarify my question if it is not clear. — drfloob, Dec 16 '10 at 02:53
@Laurence Gonsalves: Indeed. If "the need to be paranoid about where data types can be created and modified" is referring to rules and assumptions about what data in the program should and should not be, then the "ad-hoc type system" is a *very good* thing. Making assumptions about what values can be and what functions can do when using them, and ensuring these assumptions are correct, usually leads to a lot of correct code. — Joey Adams, Dec 16 '10 at 03:03
@dstarh: In erlang, you don't have objects or complex/user-defined datatypes (records are little more than syntactical sugar). I believe it's the same in the lisps, and other similar languages. I do use a lot of unit tests, but I still need some mechanism (outside unit tests) to ensure the types and ranges of complex datatypes at runtime (at the very least, so my code can fail rather than just hum along in an invalid state). Groovy gives you that out of the box, right? — drfloob, Dec 16 '10 at 03:05
@Joey Adams: It sounds as if you support the idea of creating a type system in a language that touts its lack of type system. If that's the best argument (it is the only one I've found), then I really see no reason to use a language that has no type system. There must be better alternatives out there. — drfloob, Dec 16 '10 at 03:14
@Joey Adams: `Making assumptions about what values can be and what functions can do when using them, and ensuring these assumptions are correct, usually leads to a lot of correct code.` Yes, and you make much more assumptions in Erlang than you do in Haskell. Though unfortunately it would be extremely difficult to introduce static typing in Erlang. :-( — YasirA, Dec 16 '10 at 04:20
I'd argue that it would be so difficult to introduce static typing to an Erlang-like language that it would be effectively impossible. As a single, off-the-cuff example of a problem "difficult" to solve -- how would you go about statically typing message sends and receives? Across nodes? — JUST MY correct OPINION, Dec 16 '10 at 05:09
@drfloob: you should have a look at http://www.idris-lang.org — Erik Kaplun, Jan 25 '15 at 14:19

Peer Stritzinger · Answer 1 · 2010-12-16T09:02:42.177

Aside from the confsion of static vs. dynamic and strong vs. weak typing:

What you want to implement in your example isn't really solved by most existing static typing systems. Range checks and complications like February 31th and especially parsed input are usually checked during runtime no matter what type system you have.

Your example being in Erlang I have a few recommendations:

Use records. Besides being usefull and helpfull for a whole bunch of reasons, the give you easy runtime type checking without a lot of effort e.g.:
```
is_same_day(#datetime{year=Y1, month=M1, day=D1}, 
            #datetime{year=Y2, month=M2, day=D2}) -> ...
```
Effortless only matches for two datetime records. You could even add guards to check for ranges if the source is untrusted. And it conforms to erlangs let it crash method of error handling: if no match is found you get a badmatch, and can handle this on the level where it is apropriate (usually the supervisor level).
Generally write your code that it crashes when the assumptions are not valid
If this doesn't feel static checked enough: use typer and dialyzer to find the kind of errors that can be found statically, whatever remains will be checkd at runtime.
Don't be too restrictive in your functions what "types" you accept, sometimes the added functionality of just doing someting useful even for different inputs is worth more than checking the types and ranges on every function. If you do it where it matters usually you will catch the error early enough for it to be easy fixable. This is especially true for a functionaly language where you allways know where every value comes from.

Thanks, this is all good advice. My question still stands on your second bullet point, though. Often, my assumptions don't fit nicely in guard expressions, so I end up writing type-/range-checking functions and manually ensuring they are called at appropriate times. My point is, I feel it's necessary to invent these type/validation checking constructs and enforce them myself in languages that don't have them. My question is whether this is the best/only way, or if there are other patterns and paradigms that solve this set of problems in more "natural" ways for erlang and similar languages. — drfloob, Dec 17 '10 at 09:41

score 3 · Answer 2 · answered Dec 16 '10 at 13:29

A lot of good answers, let me add:

Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?

The most important paradigm, especially in Erlang, is this: Assume the type is right, otherwise let it crash. Don't write excessively checking paranoid code, but assume that the input you get is of the right type or the right pattern. Don't write (there are exceptions to this rule, but in general)

foo({tag, ...}) -> do_something(..);
foo({tag2, ...}) -> do_something_else(..);
foo(Otherwise)  ->
    report_error(Otherwise),
    try to fix problem here...

Kill the last clause and have it crash right away. Let a supervisor and other processes do the cleanup (you can use monitors() for janitorial processes to know when a crash has occurred).

Do be precise however. Write

bar(N) when is_integer(N) -> ...

baz([]) -> ...
baz(L) when is_list(L) -> ...

if the function is known only to work with integers or lists respectively. Yes, it is a runtime check but the goal is to convey information to the programmer. Also, HiPE tend to utilize the hint for optimization and eliminate the type check if possible. Hence, the price may be less than what you think it is.

You choose an untyped/dynamically-typed language so the price you have to pay is that type checking and errors from clashes will happen at runtime. As other posts hint, a statically typed language is not exempt from doing some checks as well - the type system is (usually) an approximation of a proof of correctness. In most static languages you often get input which you can't trust. This input is transformed at the "border" of the application and then converted to an internal format. The conversion serves to mark trust: From now on, the thing has been validated and we can assume certain things about it. The power and correctness of this assumption is directly tied to its type signature and how good the programmer is with juggling the static types of the language.

Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?

Erlang has the dialyzer which can be used to statically analyze and infer types of your programs. It will not come up with as many type errors as a type checker in e.g., Ocaml, but it won't "cry wolf" either: An error from the dialyzer is provably an error in the program. And it won't reject a program which may be working ok. A simple example is:

and(true, true) -> true;
and(true, _)    -> false;
and(false, _)   -> false.

The invocation and(true, greatmistake) will return false, yet a static type system will reject the program because it will infer from the first line that the type signature takes a boolean() value as the 2nd parameter. The dialyzer will accept this function in contrast and give it the signature (boolean(), term()) -> boolean(). It can do this, because there is no need to protect a priori for an error. If there is a mistake, the runtime system has a type check that will capture it.

To clarify, it's not that we "assume" types in erlang, we must explicitly assert them. The trouble is that your examples are all very simple; it isn't nearly as clean or easy to assert the types of "complex" data as in the example given. — drfloob, Dec 22 '10 at 22:22

score 2 · Answer 3 · answered Dec 16 '10 at 04:47

In order for a statically-typed language to match the flexibility of a dynamically-typed one, I think it would need a lot, perhaps infinitely many, features.

In the Haskell world, one hears a lot of sophisticated, sometimes to the point of being scary, teminology. Type classes. Parametric polymorphism. Generalized algebraic data types. Type families. Functional dependencies. The Ωmega programming language takes it even further, with the website listing "type-level functions" and "level polymorphism", among others.

What are all these? Features added to static typing to make it more flexible. These features can be really cool, and tend to be elegant and mind-blowing, but are often difficult to understand. Learning curve aside, type systems often fail to model real-world problems elegantly. A particularly good example of this is interacting with other languages (a major motivation for C# 4's dynamic feature).

Dynamically-typed languages give you the flexibility to implement your own framework of rules and assumptions about data, rather than be constrained by the ever-limited static type system. However, "your own framework" won't be machine-checked, meaning the onus is on you to ensure your "type system" is safe and your code is well-"typed".

One thing I've found from learning Haskell is that I can carry lessons learned about strong typing and sound reasoning over to weaker-typed languages, such as C and even assembly, and do the "type checking" myself. Namely, I can prove that sections of code are correct in and of themselves, by bearing in mind the rules my functions and values are supposed to follow, and the assumptions I am allowed to make about other functions and values. When debugging, I go through and check things again, and think through whether or not my approach is sound.

The bottom line: dynamic typing puts more flexibility at your fingertips. On the other hand, statically-typed languages tend to be more efficient (by orders of magnitude), and good static type systems drastically cut down on debugging time by letting the computer do much of it for you. If you want the benefits of both, install a static type checker in your brain by learning decent, strongly-typed languages.

I agree with the second point, but not the first. Type classes, GADTs, FunDeps all produce something *more expressive* than a typical dynamically typed language. In essence, they let you manipulate class contexts *independent* of individual typed values. Not only can you not do that with standard dynamically typed languages, but it barely makes sense to think about it. — sclv, Dec 16 '10 at 16:32

score 1 · Answer 4 · answered Dec 16 '10 at 03:00

Sometimes data need validation. Validating any data received from the network is almost always a good idea — especially data from a public network. Being paranoid here is only good. If something resembling a static type system helps this in the least painful way, so be it. There's a reason why Erlang allows type annotations. Even pattern matching can be seen as just a kind of dynamic type checking; nevertheless, it's a central feature of the language. The very structure of data is its 'type' in Erlang.

The good thing is that you can custom-tailor your 'type system' to your needs, make it flexible and smart, while type systems of OO languages typically have fixed features. When data structures you use are immutable, once you've validated such a structure, you're safe to assume it conforms your restrictions, just like with static typing.

There's no point in being ready to process any kind of data at any point of a program, dynamically-typed or not. A 'dynamic type' is essentially a union of all possible types; limiting it to a useful subset is a valid way to program.

score 1 · Answer 5 · edited May 23 '17 at 12:09

A statically typed language detects type errors at compile time. A dynamically typed language detects them at runtime. There are some modest restrictions on what one can write in a statically typed language such that all type errors can be caught at compile time.

But yes, you still have types even in a dynamically typed language, and that's a good thing. The problem is you wander into lots of runtime checks to ensure that you have the types you think you do, since the compiler hasn't taken care of that for you.

Erlang has a very nice tool for specifying and statically verifying lots of types -- dialyzer: Erlang type system, for references.

So don't reinvent types, use the typing tools that Erlang already provides, to handle the types that already exist in your program (but which you haven't yet specified).

And this on its own won't eliminate range checks, unfortunately. Without lots of special sauce you really have to enforce this on your own by convention (and smart constructors, etc. to help), or fall back to runtime checks, or both.

If I'm going to need to invent constructors and mutators, and enforce standard conventions around their usage myself (this has been the case for me a few times now), I can't justify using a language that doesn't have these already. I love working in erlang, but what benefit is there if large chunks of time and code go into reinventing and enforcing what many other languages give you for free? — drfloob, Dec 17 '10 at 10:23
@drfloop -- If dialyzer isn't sufficient for your needs, then yes, I absolutely agree! On to Haskell! :-) — sclv, Dec 17 '10 at 15:10

How does one avoid creating an ad-hoc type system in dynamically typed languages?

5 Answers5