Type information in Abstract Syntax Trees

Question

What type information exists in an abstract syntax tree? How are ASTs used for type inferencing? I don't understand how type input and output can be derived given an AST when none of the nodes indicate the concrete types. Are the types inferred from the tree structure alone? e.g. There are a bunch of IfStatement(Statement), so it's likely to return a bool? For example, the javalang python module uses these AST nodes:

CompilationUnit(Node)
Import(Node)
Documented(Node)
Declaration(Node)
Type(Node)
TypeArgument(Node)
TypeParameter(Node)
Annotation(Node)
ElementValuePair(Node)
ElementArrayValue(Node)
ArrayInitializer(Node)
VariableDeclarator(Node)
InferredFormalParameter(Node)
Statement(Node)
SwitchStatementCase(Node)
ForControl(Node)
EnhancedForControl(Node)
Expression(Node)
EnumBody(Node)
VariableDeclaration(Declaration)
FormalParameter(Declaration)
TryResource(Declaration)
CatchClauseParameter(Declaration)
AnnotationMethod(Declaration)
BasicType(Type)
ReferenceType(Type)
TypeDeclaration(Declaration, Documented)
PackageDeclaration(Declaration, Documented)
ConstructorDeclaration(Declaration, Documented)
EnumConstantDeclaration(Declaration, Documented)
ClassDeclaration(TypeDeclaration)
EnumDeclaration(TypeDeclaration)
InterfaceDeclaration(TypeDeclaration)
AnnotationDeclaration(TypeDeclaration)
Member(Documented)
MethodDeclaration(Member, Declaration)
FieldDeclaration(Member, Declaration)
ConstantDeclaration(FieldDeclaration)
LocalVariableDeclaration(VariableDeclaration)
IfStatement(Statement)
WhileStatement(Statement)
DoStatement(Statement)
ForStatement(Statement)
AssertStatement(Statement)
BreakStatement(Statement)
ContinueStatement(Statement)
ReturnStatement(Statement)
ThrowStatement(Statement)
SynchronizedStatement(Statement)
TryStatement(Statement)
SwitchStatement(Statement)
BlockStatement(Statement)
StatementExpression(Statement)
CatchClause(Statement)
Assignment(Expression)
TernaryExpression(Expression)
BinaryOperation(Expression)
Cast(Expression)
MethodReference(Expression)
LambdaExpression(Expression)
Primary(Expression)
ArraySelector(Expression)
Literal(Primary)
This(Primary)
MemberReference(Primary)
Invocation(Primary)
SuperMemberReference(Primary)
ClassReference(Primary)
Creator(Primary)
ExplicitConstructorInvocation(Invocation)
SuperConstructorInvocation(Invocation)
MethodInvocation(Invocation)
SuperMethodInvocation(Invocation)
VoidClassReference(ClassReference)
ArrayCreator(Creator)
ClassCreator(Creator)
InnerClassCreator(Creator)

Given some toy code, it spits out the following AST for the functions:

public class HelloWorld{
  public static void main(String args[]){
     add(5);
  } 
  public static int add(int x){
     return x+0;
  }
}

(MethodDeclaration 
    (FormalParameter
        (ReferenceType)
    )
    (StatementExpression
        (MethodInvocation
            (Literal)
        )
    )
)

Also if anyone could point me to some good reading material on type inferencing given ASTs. Thanks.

Try searching with `Hindley-Milner` and `AST` and see if you find what you seek. I quickly found [Hindley-Milner Type Checking](http://adamdoupe.com/teaching/classes/cse340-principles-of-programming-languages-f15/slides/Hindley-MilnerTypeChecking-Mohsen-Zohrevandi.pdf) Hope that helps. — Guy Coder, Mar 07 '17 at 11:17
@GuyCoder I'm aware of the Hindley-Milner algorithm. I haven't been able to find any detailed examples of how to handle ASTs where the leaves are calls to other functions or ADTs. Also I don't understand how the value of literals can be inferred if only given an AST. If the leaves for a basic AST ends in the literals for addition, how do you know if floats or ints are being added? In all the examples I've seen, they assume you're given the value of the leaf literals. — Soubriquet, Mar 08 '17 at 18:49
Wish I knew more than to give just comments. Other suggestions 1. Add the tag [tag:hindley-milner] for more exposure at SO. 2. Try the [Computer Science Stack Exchange site](http://cs.stackexchange.com/). 3. I know lots of experts hang out at [lambda-the-ultimate](http://lambda-the-ultimate.org/) but be ready for lots of theory and reference papers. — Guy Coder, Mar 08 '17 at 19:25
The classic book [Types and Programming Languages](http://www.worldcat.org/oclc/807292064) has a chapter called `Type Reconstruction` which is a different set of key words to use. If you are really desperate I know that several professional compilers of the ML dialect have it built in, but figuring it out could quite some time, [F#](https://github.com/fsharp/fsharp), [OCaml](https://github.com/ocaml/ocaml). I would bet on, but haven't looked at [SML/NJ](http://smlnj-gforge.cs.uchicago.edu/scm/?group_id=15), and [GHC](https://github.com/ghc/ghc) — Guy Coder, Mar 08 '17 at 19:29
Also some ML languages default to integer when seeing math operations which is a problem with F# if you are not aware of it and why OCaml uses operators such as `.+` instead of overloading `+`. — Guy Coder, Mar 08 '17 at 19:36
FWIW, the scala-lang team is developing a tool called "Tasty" that works with "Typed Abstract Syntax Trees", so there will (presumably) be some discussion of the problems it addresses, possibly shedding light on your question. — philwalk, May 16 '18 at 17:36

score 4 · Answer 1 · edited Nov 01 '21 at 20:12

How are ASTs used for type inferencing?

Type inference converts an untyped AST into a typed AST by traversing the tree propagating an "environment" that is a dictionary mapping variable names (including functions) to their types. This is propagated down the AST to the leaves.

The type of a literal integer or string is int or string.

The type of a variable is looked up in the environment.

The type of a function application f(x) where f: a -> b and x: a is b.

The type of fun x -> y where x: a and y: b is a -> b.

At a let x = y in z where y: a, inference adds a binding x: a to the environment and infers the type of z (which is the type of the entire let expression) in the context of that new environment so it can lookup x: a when it encounters x.

And so on. The Hindley-Milner type system is more complicated because it includes generics but the implementation is little more than doing this to obtain a sequence of constraints about type variables and solving those constraints to work out as many type variables as possible. Type variables are usually generalized at let bindings so, for example, let id x = x defines a generic identity function ∀a id: a -> a.

Also, in the presence of mutation, type variables originating from expansive types are not generalised so, for example, ref None: '_a has a weak polymorphic type denoted '_a in OCaml meaning "this type variable can only be resolved to one concrete type", i.e. ∃a rather than ∀x.

Type inference for a minimal ML dialect with int and fun is under 100 lines of code and just a reasonable implementation is fast on modern computers.

You might appreciate the following links:

Type information in Abstract Syntax Trees

1 Answers1