43

Can someone suggest articles that explain the concept of Homoiconicity, especially using Clojure. Why is it that Clojure is homoiconic but its hard to do that in other languages such as Java ?

Brian Carper
  • 71,150
  • 28
  • 166
  • 168
Arun R
  • 8,372
  • 6
  • 37
  • 46
  • Homoiconicity is a trait that language may or may not have, but it's not something you can add(without creating a new language). It's not hard to do in Java - Java is simply not homoiconic(it's heteroiconic). It's the same as with cars - you can modify subaru impreza so it becomes a monster truck; but it isn't monster truck, and the resulting car isn't subaru impreza anymore. – MatthewRock Jul 06 '16 at 14:56

7 Answers7

27

Before I proceed with some things I wanted to add another answer for, here's one more reference -- the part related to homoiconicity is fairly short, but it is Rich Hickey doing the explaining! Channel 9 has this nice video with Rich Hickey and Brian Beckman talking about Clojure. Concurrency is, understandably, the major focus, but homoiconicity does get its own (short) moment of screen time during which Rich nicely explains the interplay between read (the function which converts concrete syntax as written down by the programmer to the internal representation built out from lists etc.) and eval. He has this nice diagram showing how eval never even knows that the code it evaluates comes from read operating on a text file... Arthur has already explained the gist behind that, but hey, watch it anyway, it's a very nice video!


A disclaimer: I'll be mentioning Java and Python as examples below the next horizontal bar. I want to make clear that the following is just a rough sketch of why I think it might be difficult to make a homoiconic, Lisp-style-macro-enabled Java or Python; it's just an academic exercise, though, and I don't want to consider the question of whether there's any reason to try in the first place. Also, I don't want to imply that the syntax of a language with Lisp style macros must contain explicit delimiters for tree structures; Dylan (the paren-less Lisp?) apparently provides a counterexample. Finally, I use the expression Lisp style macros because I'm only examining Lisp style macros. The language Forth, for example, has a different macro facility which I don't really understand except that I know it to enable wicked cool looking code. Apparently syntax extensions can be implemented in a number of ways. With this out of the way...


I'd like to address the second part of your question -- how is it that most programming languages are considered not to be homoiconic? I'll have to touch upon the semantics of Lisp in the process, but since Nils has already provided links to good sources of information on the term "homoiconic" itself and Arthur has described the read -> macro expand -> compile cycle as found in Clojure, I'll be building on that in what follows. To start things off, let me quote a passage from Alan Kay (extracted from the Wikipedia article which also links to the original source):

[...] Interactive LISP [...] and TRAC [...] both are "homoiconic" in that their internal and external representations are essentially the same.

(Those [...] bits hide a lot of text, but the gist is unchanged.)

Now, let's ask ourselves the question: what is Java's internal representation of Java? ... Well, this doesn't even make sense. The Java compiler does have a certain internal representation of Java, namely an abstract syntax tree; to construct a "homoiconic Java", we'd have to make that AST representation a first-class object in Java and devise a syntax which would allow us to write ASTs directly. That could prove to be rather hard.

Python provides an example of a non-homoiconic language which is interesting in that it currently ships with an AST-manipulation toolkit in the form of the ast module. The docs for that module explicitly state that Python ASTs may change between releases, which may or may not be discouraging; still, I suppose an industrious programmer could take the ast module, devise a syntax (maybe S-expression based, maybe XML-based) for describing Python ASTs directly and construct a parser for that syntax in regular Python using ast, thus taking a solid first step towards creating a homoiconic language with Python semantics. (I believe I came across a dialect of Lisp compiling to Python bytecode some time ago... I wonder if it might be doing something like that at some level?)

Even then the problem remains of extracting concrete benefits from that kind of homoiconicity. It's viewed as a beneficial property of members of the Lisp family of languages because it allows us to write programmes which write further programmes, among which macros are the most notable. Now, while macros are enabled in one way by the fact that it is so easy to manipulate the internal representation of Lisp code in Lisp, they are also enabled in an equally important way by the Lisp execution model: a Lisp programme is just a collection of Lisp forms; these are processed by the Lisp function eval which is responsible for determining the values of the expressions and causing the appropriate side-effects at the correct time; the semantics of Lisp are exactly the semantics of eval. The question of how things work internally to preserve this semantic illusion while being reasonably fast is an implementation detail; a Lisp system has an obligation to expose a function eval to the programmer and to act as if Lisp programmes were being processed by that function.

In modern Lisp systems, it is a part of eval's contract that it performs an additional preprocessing phase during which macros are expanded prior to evaluating the code (or compiling and running, as the case may be). That particular facility is not a necessary part of a Lisp system, but it is just so easy to plug it into this execution model! Also, I wonder if this isn't the only execution model which makes the Lisp kind of macro transformations manageable, which would mean that any language seeking to incorporate Lisp-style macros would have to adopt a similar execution model. My intuition tells me that this is indeed the case.

Of course once a language is written down in notation directly paralleling its ASTs and uses a Lisp-like execution model with an evaluator function / object, one must wonder if it isn't by any chance another dialect of Lisp... even if its AST-paralleling syntax happens to be XML-based. shudder

Michał Marczyk
  • 83,634
  • 13
  • 201
  • 212
9

When I was learning Lisp the idea of homoiconicity made sense when i learned that the lisp is "compiled" in two phases, reading and compiling and the code is represented with the same data structure for both of these:

  • first you think of an s-expression in your head
  • then you type the s-expression as characters in a file
  • then the reader translates the characters in the file into s-expressions. Its not compiling the program, just building data structures from characters this is part of the reading phase.
  • then the reader looks at each of the expressions and decided if they are a macro and if so runs the macro to produce another s-expression. so at this point we have gone from s-expressions to characters to s-expressions, then from s-expressions to different s-expressions.
  • these s-expressions are then compiled into .class files that can be run by the jvm this is the second of "compiling" phase.

So its pretty much s-expressions all the way from your brain to the .class file. you even write s-expressions that write s-expressions. so you can say that "the code is the data" or "code is data" because that just sounds better.

Arthur Ulfeldt
  • 90,827
  • 27
  • 201
  • 284
  • 1
    Actually it's misleading to imply that macro expansion happens inside the reader. Note how `(eval '(when true (println :foo) (println :bar)))` works as expected, even though the reader clearly may not mangle the quoted list structure `'(when true (println :foo) (println :bar))`. The standard nomenclature distinguishes between read time (character stream -> data structures conversion) and macro expansion time (data structure transformation) too. The fact remains that macro expansion happens before compilation, though, which is the key point to understanding macros. – Michał Marczyk Feb 19 '10 at 19:27
  • thanks for pointing that out. I removed the comment about that being part of the reader. – Arthur Ulfeldt Feb 20 '10 at 02:07
  • The reader does not translate characters into s-expressions. The reader translates s-expressions into data. – Rainer Joswig Feb 20 '10 at 04:28
  • 1
    The reader also does not care about macros and does not run macros to produce other s-expressions. Macros and the reader are not related in any way. – Rainer Joswig Feb 20 '10 at 04:30
7

The whole idea of 'homoiconicity' is slightly confused and does not fit well into Lisp. Internal and external representations are not the same in Lisp. External representation is based on characters in files. The internal representation is based on Lisp data (numbers, strings, lists, arrays, ...) and is non-textual. How is that the same as characters? There are internal representations, which have no corresponding external representations (for example compile code, closures, ...).

The main difference between Lisp and many other programming languages is, that Lisp has a simple data representation for source code - one which is not based on strings.

Obviously code can be represented as strings in text-based programming languages. But in Lisp the source can be represented in terms of primitive Lisp data structures. The external representation is based on s-expressions, which is a simple model to represent hierarchical data as text. The internal model is representation is based on lists, etc.

That's what the evaluator gets: internal representations. Not 1 to 1 versions of the textual input, but parsed.

The basic model:

  • READ translates external s-expressions into data
  • EVAL takes Lisp forms in the form of Lisp data and evaluates them
  • PRINT translates Lisp data into external s-expressions

Note that READ and PRINT work for arbitrary Lisp data, which have a printed representation and a reader, and not only for Lisp forms. Forms are by definition valid expressions in the Lisp programming language.

Rainer Joswig
  • 136,269
  • 10
  • 221
  • 346
4

Here's a short program to do symbolic differentiation. This is an example of LISP manipulating its own code. Try translating it to another language to see why LISP is good for this sort of thing.

;; The simplest possible symbolic differentiator

;; Functions to create and unpack additions like (+ 1 2)
(defn make-add [ a b ] (list '+ a b))
(defn addition? [x] (and (=(count x) 3) (= (first x) '+)))
(defn add1   [x] (second x))
(defn add2   [x] (second (rest x)))

;; Similar for multiplications (* 1 2)
(defn make-mul [ a b ] (list '* a b))
(defn multiplication? [x] (and (=(count x) 3) (= (first x) '*)))
(defn mul1   [x] (second x))
(defn mul2   [x] (second (rest x)))

;; Differentiation. 
(defn deriv [exp var]
  (cond (number? exp) 0                                                              ;; d/dx c -> 0
        (symbol? exp) (if (= exp var) 1 0)                                           ;; d/dx x -> 1, d/dx y -> 0
        (addition? exp) (make-add (deriv (add1 exp) var) (deriv (add2 exp) var))     ;; d/dx a+b -> d/dx a + d/dx b
        (multiplication? exp) (make-add (make-mul (deriv (mul1 exp) var) (mul2 exp)) ;; d/dx a*b -> d/dx a * b + a * d/dx b
                                        (make-mul (mul1 exp) (deriv (mul2 exp) var)))
        :else :error))

;;an example of use: create the function x -> x^3 + 2x^2 + 1 and its derivative 
(def poly '(+ (+ (* x (* x x)) (* 2 (* x x))) 1))

(defn poly->fnform [poly] (list 'fn '[x] poly))

(def polyfn  (eval (poly->fnform poly)))
(def dpolyfn (eval (poly->fnform (deriv poly 'x))))
John Lawrence Aspden
  • 17,124
  • 11
  • 67
  • 110
  • 1
    Would you explain this a bit more? – Ali Nov 20 '11 at 00:17
  • I'm a bit busy at the moment, but it came from this: http://www.learningclojure.com/2010/02/clojure-dojo-4-symbolic-differentiation.html , which is part four of a series of 'dojos' for clojure beginners. If you don't get the clojure bits then reading the earlier ones may help (find out what a dojo is first though). If the maths is the problem then read a school maths book on differentiation that goes as far as the product rule. – John Lawrence Aspden Nov 21 '11 at 11:18
4

As Rainer Joswig points out, there are good reasons to doubt the utility of the idea of homoiconicity, and whether Lisps are actually Homoiconic.

The original definition of homoiconiticy centers on a similarity between the internal and external representations of a language. The canonical example is Lisp, with its s-expressions.

There are (at least) two probles with that definition and choice of example.

The first objection concerns the external representation. In the case of Lisp we assume that the external representation is an s-expression. In most practical programming environments, however, the actual representation of program sources is as text files which contain strings of characters. It is only after parsing this text that the representation is really an s-expression. In other words: in practical environments the external representation is not an s-expression, but text.

The second objection concerns the internal representation. Practical implementations of Lisp interpreters do generally not operate actually directly on s-expressions internally for performance reasons. Even though a Lisp might be defined in terms of a case-analysis on s-expressions, it is not usually implemented as such. Thus, the internal representation is not actually an s-expression in practice.

In fact, one might even raise further questions around the concept of homoiconicity: for a well-encapsulated machine, we cannot observe its inner workings by definition; in that view, making any statement about the internal representation of the machine is meaningless. More generally, the original definition has the problem that the idea that there is a single external and a single internal representation of the program does not match with reality. In fact, there is a whole chain of representations, including electrons in the brain of the programmer, photons emitted from the screen, program text, machine code, and electrons moving in the CPU.

I wrote about this more extensively in an article called Don't say “Homoiconic”

Klaas van Schelven
  • 2,374
  • 1
  • 21
  • 35
0

It almost seems to be to obvious, but first sources might be:

http://en.wikipedia.org/wiki/Homoiconicity

http://c2.com/cgi/wiki?DefinitionOfHomoiconic

Homoiconicity is explained in general and you can also find the originated sources. As it is explained by using the example of Lisp, it is not that far from Clojure.

Nils Schmidt
  • 3,702
  • 6
  • 23
  • 28
  • So if I try the Wikipedia example in Java. 1. A java program has a Class with a field String that is in itself a java program. (properly escaped) 2. Manipulate it using Regex to replace some substring, such as COS to SIN 3. Make a call to an external program ( A java compiler ) that will execute the above modified String that is written to a file. 4. The output would be the execution of the program which was a String in the parent class. 5. You could as well take the String field as an input in the main method of parent class. So does that imply that Java is homoiconic as well ? – Arun R Feb 19 '10 at 14:23
  • Surely the bytecode representation of class files in Java is not same as the syntax for Java code... – Arun R Feb 19 '10 at 14:23
  • To add to the first case, I will have the classLoader load the class files generated after compilation of the above program and use reflection to execute methods on the newly loaded classes. – Arun R Feb 19 '10 at 14:37
  • Icarus: You're using one tool (regex) to parse your Java program as text. You're using a second tool (compiler) to rebuild the changed program. Nowhere did you suggest how Java programs are primarily represented in some Java-primitive data structure, as required by the first sentence of the WP article. What primitive data structure holds sin(x)? – Ken Feb 20 '10 at 05:55
  • So the "primary representation" of a function call in Java is with a string object? I would have thought it's some bytecode. – Ken Nov 26 '10 at 19:50
0

Here is my take for and as a Clojure newbie in the middle of getting it.

  1. What is Homoiconicity?:

"code is data, data is code"

This famous quote means

a language that has a feature where the representation of code is same as the representation of data can be considered "homoiconic"

To be specific, typically, from a high overview,

steps for compiling a programming language very roughly would be

"text (source code)" -> tokenize -> tokens -> parse -> "AST" -> do whatever with AST (take effect || IR optimization || down to machine code)

Here, the form of 'text(source code)' and 'AST' are equivalent so you can have the full aceess to AST before run-time during the compile time, allowing us to treat 'AST' purely like handling normal Clojure data structure programmatically.

let's take code examples: Clojure code:

 (+ 1 2) 
 (def res (+ 1 2))
 (sum [1 2 3 4 5]) ;; function call 

They are Clojure code but at the same time, the above Clojure code will be tokenized and parsed into AST, which is the equivalent form as the original code, which is also a completely valid AST form.

When the condition is met, We call the language "homoiconic", and

That's the meaning of "code is data, data is code"

Let's see other not homoiconic languages code:

1 + 2
var res = 1 + 2
sum([1, 2, 3, 4, 5]) // function call 

The above code (not homoiconic) will be tokenized and parsed into AST, which will be transformed into a completely different AST form from the original code.

This is good article to read about this : https://www.braveclojure.com/read-and-eval/

In the same way you can handle data structure programmatically in other not homoiconic language, you can control your 'AST in Clojure (LISP)'

Take, for example, Swift:

 [1, 2, 3, 4, 5]
    .filter {$0 % 2 == 0}
    .map { String ($0) }

Like above, Swift programmers can process, manipulate and handle data programmatically on compile time before run-time.

In the same sense, Clojure (LISP) programmers can process, manipulate and handle not only data but also 'AST' on compile time before run-time.

One more time, simply put, you may think of Clojure syntax (+ 1 2) as a valid AST (Abstract Syntax Tree),

A plain clojure code is already the valid AST and you write the AST as code and data.

  1. Why LISP Homoiconicity is so special?

Since the form of AST and code are equivalent, You can take meta programming to the max to the point you can add new syntax to the language whereas in other languages,

for example, Java programmers had to wait for each for years until implemented.

Imagine you want to add list comprehension syntax into your language inpisred by Python.

[x for x in range (10)]

how? you can't in your language although you already know the concept due to the limit of the language.

In Clojure, This syntax was implemented already using macro by Rich Hickey who is the creator of Clojure. Adding new syntax is like adding a new library in Clojure (LISP)

(for [x (range 10)] x)

https://github.com/clojure/clojure/blob/b98ba84/src/clj/clojure/core.clj#L4590

You don't have to wait for laugage designers to implement a new syntax for the unknown amount time. Using macro, LISP can be extended to as much as you can. That's why LISP is called 'programmable programming language'

  1. Real examples of homoiconicity and macro in Clojure

Let's take a real example of Clojure maximizing homoiconicity & macro:

Let's say you want to add new syntax to the language as below.

(1) C style imperative for loop syntax that Clojure doesn't have

(defmacro for-loop [[sym init check change :as params] & steps] 
    `(loop [~sym ~init value# nil] 
       (if ~check 
          (let [new-value# (do ~@steps)] 
           (recur ~change new-value#))
           value#)))

(for-loop [i 0, (< i 10), (inc i)]
          (prn i))

implemented by @mikera: https://stackoverflow.com/users/214010/mikera

(2) new range syntax close to english

(from 0 to 10)
(from 0 to 10 by 0.5)

(defmacro from
  [x y z & more]
  `(when (= '~y '~'to)
     (if '~more (range ~x ~z (second '~more))
         (range ~x ~z))))

implemented by: https://softwareengineering.stackexchange.com/a/363513

I did write my personal study note providing the high overview or macro and homoiconicity for a little bit more information about this in general on https://gist.github.com/boraseoksoon/1d592baa7f3cfde7aabd6707d8464a75

Since it is the personal note, It is not that organized, but

If interested, you may take a look at this.

boraseoksoon
  • 2,164
  • 1
  • 20
  • 25