16

The two languages where I have used symbols are Ruby and Erlang and I've always found them to be extremely useful.

Haskell does have algebraic datatypes, but I still think symbols would be mighty convenient. An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".

The syntactic sugar for atoms can be minor - :something or <something> is an atom. All atoms are instances of a Type called Atom which derives Show and Eq. You can then use it for more descriptive error codes, for example

type ErrorCode = Atom
type Message = String
data Error = Error ErrorCode Message
loginError = Error :redirect "Please login first"

In this case :redirect is more efficient than using a string ("redirect") and easier to understand than an integer (404).

The benefit may seem minor, but I say it is worth adding atoms as a language feature (or at least a GHC extension).

So why have symbols not been added to the language? Or am I thinking about this the wrong way?

Anupam Jain
  • 7,851
  • 2
  • 39
  • 74
  • 1
    Wouldn't error codes and such be a situation where you'd want a predefined set of values, rather than allowing arbitrary stuff that might be nonsense, though? Presumably there'd be code elsewhere handling the errors, and you'd want to make sure you only give it things it knows how to deal with. – C. A. McCann May 30 '11 at 04:13
  • Not necessarily. I might want to use the error codes as I come up with them, without having to define the entire set of errors as a data type first. Handler code can simply handle the cases it wants to handle, while lumping the rest in a default handler. – Anupam Jain May 30 '11 at 04:19
  • That doesn't seem terribly idiomatic for Haskell. But even so, I'd think the library sclv mentioned would suffice, so I guess I'm still not seeing why it would make much difference. – C. A. McCann May 30 '11 at 04:38

5 Answers5

20

I agree with camccann's answer that it's probably missing mainly because it would have to be baked quite deeply into the implementation and it is of too little use for this level of complication. In Erlang (and Prolog and Lisp) symbols (or atoms) usually serve as special markers and serve mostly the same notion as a constructor. In Lisp, the dynamic environment includes the compiler, so it's partly also a (useful) compiler concept leaking into the runtime.

The problem is the following, symbol interning is impure (it modifies the symbol table). Because we never modify an existing object it is referentially transparent, however, but if implemented naïvely can lead to space leaks in the runtime. In fact, as currently implemented in Erlang you can actually crash the VM by interning too many symbols/atoms (current limit is 2^20, I think), because they can never get garbage collected. It's also difficult to implement in a concurrent setting without a huge lock around the symbol table.

Both problems can be (and have been) solved, however. For example, see Erlang EEP 20. I use this technique in the simple-atom package. It uses unsafePerformIO under the hood, but only in (hopefully) rare cases. It could still use some help from the GC to perform an optimisation similar to indirection shortening. It also uses quite a few IORefs internally which isn't too great for performance and memory usage.

In summary, it can be done but implementing it properly is non-trivial. Compiler writers always weigh the power of a feature against its implementation and maintenance efforts, and it seems like first-class symbols lose out on this one.

nominolo
  • 5,085
  • 2
  • 25
  • 31
  • +1 for providing more information on the cost of implementing atoms. I am beginning to agree that atoms are perhaps not terribly useful in Haskell. – Anupam Jain May 30 '11 at 15:53
14

I think the simplest answer is that, of the things Lisp-style symbols (which is where both Ruby and Erlang got the idea, I believe) are used for, in Haskell most are either:

  • Already done in some other fashion--e.g. a data type with a bunch of nullary constructors, which also behave as "convenient names for integers".

  • Awkward to fit in--things that exist at the level of language syntax instead of being regular data usually have more type information associated with them, but symbols would have to either be distinct types from each other (nearly useless without some sort of lightweight ad-hoc sum type) or all the same type (in which case they're barely different from just using strings).

Also, keep in mind that Haskell itself is actually a very, very small language. Very little is "baked in", and of the things that are most are just syntactic sugar for other primitives. This is a bit less true if you include a bunch of GHC extensions, but GHC with -XAndTheKitchenSinkToo is not the same language as Haskell proper.

Also, Haskell is very amenable to pseudo-syntax and metaprogramming, so there's a lot you can do even without having it built in. Particularly if you get into TH and scary type metaprogramming and whatever else.

So what it mostly comes down to is that most of the practical utility of symbols is already available from other features, and the stuff that isn't available would be more difficult to add than it's worth.

C. A. McCann
  • 76,893
  • 19
  • 209
  • 302
  • 1
    I'm pretty sure Erlang got the idea in Prolog, which might have been influenced by Lisp but equally likely by other research languages of the 1960s. – Fred Foo May 28 '11 at 21:24
  • @larsmans: Oh, good point. I have no idea why I forgot about Prolog's influence on Erlang. That does indeed make more sense. – C. A. McCann May 28 '11 at 21:39
  • 3
    Erlang definitely got it from Prolog, and Prolog from Lisp. – augustss May 28 '11 at 22:11
  • Addition of atoms seems like a very easy to use and implement change to Haskell. Also, I can imagine it leading to concise and beautiful haskell code, which I suppose matters a lot to many haskellers. If we have to use "scary" TH/metaprogramming then that beauty and conciseness is lost. – Anupam Jain May 30 '11 at 02:50
  • @Anupam Jain: What sort of special syntax do you have in mind, and how would you expect the feature to be used? The benefit seems pretty minor to me, honestly. – C. A. McCann May 30 '11 at 03:35
  • I have edited the question to provide an example of the syntactic sugar. – Anupam Jain May 30 '11 at 03:57
9

Atoms aren't provided by the language, but can be implemented reasonably as a library:

http://hackage.haskell.org/package/simple-atom

There are a few other libs on hackage, but this one looks the most recent and well-maintained.

sclv
  • 38,665
  • 7
  • 99
  • 204
  • Thanks that looks interesting! But as I said in the comment above, If we don't have in-built syntactic sugar for atoms then the beauty and conciseness is lost, which is the main motivating factor for using them. – Anupam Jain May 30 '11 at 02:52
3

Haskell uses type constructors* instead of symbols so that the set of symbols a function can take is closed, and can be reasoned about by the type system. You could add symbols to the language, but it would put you in the same place that using strings would - you'd have to check all possible symbols against the few with known meanings at runtime, add error handling all over the place, etc. It'd be a big workaround for all the compile-time checking.

The main difference between strings and symbols is interning - symbols are atomic and can be compared in constant time. Both are types with an essentially infinite number of distinct values, though, and against the grain of Haskell's specifying arguments and results with finite types.

  • I'm more familiar with OCaml than Haskell, so "type constructor" may not be the right term. Things like None or Just 3.
silentbicycle
  • 1,164
  • 6
  • 9
  • 2
    For the sake of precision: "type constructors" construct types, and "data constructors" construct data, i.e. values. So `Nothing :: Maybe a` and `Just :: a -> Maybe a` are data constructors, whereas `Maybe :: * -> *` is a type constructor. No doubt you feel much more enlightened by that important detail. ;) – C. A. McCann May 29 '11 at 05:46
  • The difference between strings and atoms is that you can construct new strings at runtime but atoms cannot be manipulated in any way. They should only have predefined instances of Show and Eq (and probably Ord too). This should allow an efficient compilation strategy. – Anupam Jain May 30 '11 at 03:08
1

An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".

Use Enum instead.

data FileType = GZipped | BZipped | Plain
  deriving Enum

descr ft  =  ["compressed with gzip",
              "compressed with bzip2",
              "uncompressed"] !! fromEnum ft
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • The problem with emums/adts is that they only work when the values are knowable at compile-time. Atoms/interned strings are particularly handy for cases like compilers when the values are determined at run-time. – sclv May 28 '11 at 21:28
  • @sclv: then I must have misunderstood the OP, since symbols that are not knowable at compile-time are not isomorphic to (fixed-range) integers in any non-trivial way. – Fred Foo May 28 '11 at 21:30
  • Well, they are until you have too many and then that's an error :-) – sclv May 28 '11 at 22:18
  • Well I was thinking of atoms known at compile time. But defining a data structure / enum everytime you need to use one is kinda a pain. Also, you need to make the ADT available to every bit of code that uses it, whereas a symbol is automatically understood. – Anupam Jain May 30 '11 at 04:03
  • @Anupam I think its a good sort of pain, though. Using an ADT instead forces you to group your "atoms" together into (hopefully) meaningful types. `:plain :: Atom` could mean a lot of things; `Plain :: FileType` is clearer in its meaning and allows the compiler to help you use it only where it makes sense. – Dan Burton Jun 04 '11 at 05:55