Compiling Erlang To Javascript Via Core Erlang

Question

So started making progress on LuvvieScript and then it all kicked off a bit on Twitter... https://twitter.com/gordonguthrie/status/389659700741943296

Anthony Ramine https://twitter.com/nokusu made the point that I was doing it wrong and I should be compiling from Erlang to JavaScript via Core Erlang and not the Erlang AST. This is both an compelling yet unattractive option for me... Twitter not being the right medium for that discussion I thought I would write it up here and get some advice on that.

Strategic Overview

LuvvieScript has three core requirements:

a valid subset of Erlang that compiles to same and performant Javascript
a complete Source Map so that it can be debugged in the browser in LuvvieScript not Javascript
a 'runtime' client-side javascript environment (with server-side comms) to execute LuvvieScript modules in (a sort of in-page supervisor...)

The third of these options is kinda out of scope for this debate but the first two are core.

There is a lazy-gits corollary - I want to use as many Erlang and Javascript syntax tools (lexers, parser, tokenizers, AST transforms, etc, etc, etc) as possible and write the smallest amount of code.

Current Thinking

The way the code is currently written as the following structure:

compile the code to the Erlang AST (which has line numbers)
tokenise the code (keeping comments and white space) and use those tokens to build a dictionary that maps line/column info to tokens
merge the dictionary and AST to give a line/col AST (with some fannying about to group fns of different arities)
transform this new Erlang AST to a Javascript AST as implmented in the SpiderMonkey Parser API https://developer.mozilla.org/en-US/docs/SpiderMonkey/Parser_API
use Javascript utils like brushtail to mutate away tail calls in the Javascript AST https://github.com/puffnfresh/brushtail
use Javascript utils like ESCodeGen to emit the javascript https://github.com/Constellation/escodegen

Basically I get an Erlang AST that looks something like this:

 [{function,
      {19,{1,9}},
      atom1_fn,0,
      [{clause,
           {19,none},
           [],
           [[]],
           [{match,
                {20,none},
                [{var,{20,{5,6}},'D'}],
                [{atom,{20,{11,15}},blue}]},
            {var,{21,{5,6}},'D'}]}]}]},

and I then transpose it into a Javascript JSON AST that looks like:

{
    "type": "Program",
    "body": [
        {
            "type": "VariableDeclaration",
            "declarations": [
                {
                    "type": "VariableDeclarator",
                    "id": {
                        "type": "Identifier",
                        "name": "answer",
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 4
                            },
                            "end": {
                                "line": 2,
                                "column": 10
                            }
                        }
                    },
                    "init": {
                        "type": "BinaryExpression",
                        "operator": "*",
                        "left": {
                            "type": "Literal",
                            "value": 6,
                            "raw": "6",
                            "loc": {
                                "start": {
                                    "line": 2,
                                    "column": 13
                                },
                                "end": {
                                    "line": 2,
                                    "column": 14
                                }
                            }
                        },
                        "right": {
                            "type": "Literal",
                            "value": 7,
                            "raw": "7",
                            "loc": {
                                "start": {
                                    "line": 2,
                                    "column": 17
                                },
                                "end": {
                                    "line": 2,
                                    "column": 18
                                }
                            }
                        },
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 13
                            },
                            "end": {
                                "line": 2,
                                "column": 18
                            }
                        }
                    },
                    "loc": {
                        "start": {
                            "line": 2,
                            "column": 4
                        },
                        "end": {
                            "line": 2,
                            "column": 18
                        }
                    }
                }
            ],
            "kind": "var",
            "loc": {
                "start": {
                    "line": 2,
                    "column": 0
                },
                "end": {
                    "line": 2,
                    "column": 19
                }
            }
        }
    ],
    "loc": {
        "start": {
            "line": 2,
            "column": 0
          },
        "end": {
            "line": 2,
            "column": 19
           }
    }
}

El Problemo

Anthony's point is well made - Core Erlang is a simplified and more regular language than Erlang and should be more easily transpiled to Javascript than plain Erlang, but it is not very well documented.

I can get an AST like representation of Core Erlang easily enough:

{c_module,[],
    {c_literal,[],basic_types},
    [{c_var,[],{atom1_fn,0}},
     {c_var,[],{atom2_fn,0}},
     {c_var,[],{bish_fn,1}},
     {c_var,[],{boolean_fn,0}},
     {c_var,[],{float_fn,0}},
     {c_var,[],{int_fn,0}},
     {c_var,[],{module_info,0}},
     {c_var,[],{module_info,1}},
     {c_var,[],{string_fn,0}}],
    [],
    [{{c_var,[],{int_fn,0}},{c_fun,[],[],{c_literal,[],1}}},
     {{c_var,[],{float_fn,0}},{c_fun,[],[],{c_literal,[],2.3}}},
     {{c_var,[],{boolean_fn,0}},{c_fun,[],[],{c_literal,[],true}}},
     {{c_var,[],{atom1_fn,0}},{c_fun,[],[],{c_literal,[],blue}}},
     {{c_var,[],{atom2_fn,0}},{c_fun,[],[],{c_literal,[],'Blue 4 U'}}},
     {{c_var,[],{string_fn,0}},{c_fun,[],[],{c_literal,[],"string theory"}}},
     {{c_var,[],{bish_fn,1}},
      {c_fun,[],
          [{c_var,[],'_cor0'}],
          {c_case,[],
              {c_var,[],'_cor0'},
              [{c_clause,[],
                   [{c_literal,[],bash}],
                   {c_literal,[],true},
                   {c_literal,[],berk}},
               {c_clause,[],
                   [{c_literal,[],bosh}],
                   {c_literal,[],true},
                   {c_literal,[],bork}},
               {c_clause,
                   [compiler_generated],
                       [{c_var,[],'_cor1'}],
                   {c_literal,[],true},
                   {c_primop,[],
                       {c_literal,[],match_fail},
                       [{c_tuple,[],
                            [{c_literal,[],case_clause},
                             {c_var,[],'_cor1'}]}]}}]}}},
     {{c_var,[],{module_info,0}},
      {c_fun,[],[],
          {c_call,[],
              {c_literal,[],erlang},
              {c_literal,[],get_module_info},
              [{c_literal,[],basic_types}]}}},
     {{c_var,[],{module_info,1}},
      {c_fun,[],
          [{c_var,[],'_cor0'}],
          {c_call,[],
              {c_literal,[],erlang},
              {c_literal,[],get_module_info},
              [{c_literal,[],basic_types},{c_var,[],'_cor0'}]}}}]}

But no line col/nos. So I can get an AST that will generate JS - but critically not SourceMaps.

Question 1 How can I get the line information I need - (I can already get column information from the 'normal' Erlang tokens...)

Erlang Core is slightly different to normal Erlang in the production process because it starts substituting variable names in function calls for its own internal ones which will also cause some Source Map problems. An example would be this Erlang clause:

bish_fn(A) ->
    case A of
        bash -> berk;
        bosh -> bork
    end.

The Erlang AST preserves the names nicely:

 [{function,
      {31,{1,8}},
      bish_fn,1,
      [{clause,
           {31,none},
           [{var,{31,{11,12}},'A'}],
           [[]],
           [{'case',
                {32,none},
                [{var,{32,{11,12}},'A'}],
                [{clause,
                     {33,none},
                     [{atom,{33,{9,13}},bash}],
                     [[]],
                     [{atom,{34,{13,17}},berk}]},
                 {clause,
                     {35,none},
                     [{atom,{35,{9,13}},bosh}],
                     [[]],
                     [{atom,{36,{13,17}},bork}]}]}]}]}]},

Core Erlang has already mutated away the names of the parameters called in the function:

'bish_fn'/1 =
    %% Line 30
    fun (_cor0) ->
    %% Line 31
    case _cor0 of
      %% Line 32
      <'bash'> when 'true' ->
          'berk'
      %% Line 33
      <'bosh'> when 'true' ->
          'bork'
      ( <_cor1> when 'true' ->
        primop 'match_fail'
            ({'case_clause',_cor1})
        -| ['compiler_generated'] )
    end

Question 2 is there anything I can to to preserve or map variable names in Core Erlang?

Question 3 I appreciate that Core Erlang is explicity designed to make it easy to compile into Erlang and write tools that mutate Erlang Code - but the question really it will it make it easier to compile out of Erlang?

Options

I could fork the core erlang code and add a source mapping options but I play the Lazy Man card here...

Update

In response to Eric's response, I should clarify how I am generating the Core Erlang cerl records. I first compile my plain Erlang to core erlang using:

c(some_module, to_core)

Then I use core_scan and core_parse in this function nicked from compiler.erl:

compile(File) ->
    case file:read_file(File) of
        {ok,Bin} ->
            case core_scan:string(binary_to_list(Bin)) of
                {ok,Toks,_} ->
                    case core_parse:parse(Toks) of
                        {ok, Mod} ->
                            {ok, Mod};
                        {error,E} ->
                            {error, {parse, E}}
                    end;
                {error,E,_} ->
                    {error, {scan, E}}
            end;
        {error,E} ->
            {error,{read, E}}
    end.

The question is how do I/can I get that toolchain to emit an annotated AST. I suspect I would need to add those options myself :(

You should go through the Core Erlang concrete representation, use compile:file(File, [binary,to_core]) instead. For the column numbers you should just participate to Erlang development and make them a reality, as they should be by now. — nox, Oct 18 '13 at 19:10
Gordon, what kind of environment are you planning for the compiler itself? Is that too supposed to be running in the browser, or offline (maybe on the server side) in a full OTP installation? — RichardC, Oct 20 '13 at 20:51

score 6 · Answer 1 · edited Nov 12 '20 at 04:43

Line numbers are provided as annotations. If you look at the cerl module, which I really recommend you use, you will see everything pretty much takes a list of annotations. One of those annotations is an unadorned number that represents the line number. If I remember correctly for Core AST directly and the atom1_fn var was on line 10. The AST would look as follows:

{c_var,[10],{atom1_fn,0}}
No, you have to do all the bookkeeping yourself. There isn't anything out there to do it for you.
I am not sure I understand this question.

Everything Anthony said was true about Core Erlang. Those are the very same reasons I chose Core Erlang as a target language for Joxa. The lesson I learned from that is that while Core Erlang is a great easy to target target language it has two major drawbacks that recommend against it.

Dialyzer only works with an Erlang AST in the abstract code block of the beam file. There is no way to get such an AST into that abstract code block when compiling to Core Erlang. So if you target Core Erlang, Dialyzer wont work for you. That is true regardless of whether or not you produce the correct spec attributes.
You lose the use of tools that work on the Erlang AST. For example, the ability to compile to Erlang Source. The Core Erlang to/from source compilers are very buggy and simply do not work. This is a major win in a lot of areas of pragmatic use.

I am actually in the process of retargeting Joxa to the Erlang AST for the above reasons.

Btw, you might be interested in this project. https://github.com/5HT/shen. Its a JavaScript compiler for the Erlang AST that already exists and is working. Though I don't have a lot of experience with it.

** Edit: You can actually see a core erlang AST generated from Erlang source. This helps a ton when learning how to compile to core. ec_compile in the erlware_commons repo has a lot of utility functions to help with that.

Eric, the problem is that I am compiling out of Erlang - I don't user cerl.erl because I am not building the Core Erlang AST in my compiler - it is a step to transpiling Erlang to JS (see update) — Gordon Guthrie, Oct 18 '13 at 16:55
Eric, no Dialyzer doesn't work only on the Erlang AST. In fact it retrieves it only to compile it to Core Erlang, upon which the analysis is done. See module dialyzer_utils, IIRC. The compiler could very well be patched to support a new option +core_info that would be used to store the Core AST into a core_code ("Core") BEAM chunk. The Core Erlang compiler should not crash with most Core code if you pass +clint0. — nox, Oct 18 '13 at 19:12
Eric, when you say that the "Core Erlang to/from source compilers are very buggy", I assume that you mean that there are cases of generated Core Erlang code that the compiler doesn't handle. Apart from that, the Core Erlang compiler stage is used every time you compile to BEAM, but it has some blind spots when it comes to code that you currently never get when coming from Erlang source code. Apart from that, to_erl is just a dump of the internal compiler format at that point, and from_erl just reads it back and continues. — RichardC, Oct 20 '13 at 20:49
I have figured out how to get Core Erlang with line numbers - just need to find the time to get back to writing it. — Gordon Guthrie, Nov 03 '13 at 09:21

score 1 · Answer 2 · answered Jan 02 '15 at 21:09

How do you get the Core Erlang? I have been using

dialyzer_utils:get_core_from_src(File)

where I get a nice structure with c_let c_variable etc and with nice line numbers. However, I noticed that it is not the same Core Erlang I get when I do c("",[to_core]). For example, I get a c_case per record access, and this is optimized away in the .core file generated by c("",[to_core]).

What is the recommended approach to get Core Erlang as an internal structure to be processed by Erlang.

I tried something other first, but then the line numbers were not set.