2

I would like to write an OCaml module having a compile function which will take a string containing an OCaml program and output the outcome of compilation as either Correct or Error, and in the Error case, information on the line and character of the first error. I don't care about what the error was (at least for now).

A basic module type for this is:

module type Compile =
    sig
        type result = Correct | Error of (int*int)
        val compile : string -> result
    end

There at least two fundamentally different ways to implement it:

  1. the quick hack way -- write the string with the program to a file, use Unix processes to invoke ocamlc on the command line and parse the stderr;
  2. the proper way -- use OCaml tools to analyse the code.

Regarding (1), I'm having trouble getting hold of the contents of stderr. I started by using Unix.open_process_in, which redirects stdout to an in_channel which I can then read. Since the compilation errors output by ocamlc go to stderr (and not to stdout), I tried to use

let ic = Unix.open_process_in "ocamlc test.ml 2>&1" 

so that the content of stderr was redirected at the command line to stdout. This way, I'd expect, ic would contain the compilation errors. Unfortunately, this was not the case and ic contained just End_of_file.

I then moved to the Unix.create_process command, as it allows me to choose new channels for out, in, err. But I'm having trouble choosing these channels. What should the out channel be so that I can then read it in my program?

Regarding (2), what do you think is a (reasonably) simple way to go?

Many thanks for any help!

Surikator
  • 1,463
  • 9
  • 26
  • Regarding (2): OCaml isn't the simplest language. You'd have to build a full compiler frontend. Even those that wouldn't have to ask about this would propably need weeks for a mostly-working version. And in case my assumption that you have little experience with compiler construction (or even parsing, for that matter) is correct... –  Feb 19 '11 at 21:38
  • 1
    @delnan OCaml does, however, give you all the pieces of the compiler in Camlp4. – Michael Ekstrand Feb 19 '11 at 23:02
  • @delnan I have written parsers using OCaml's lex and yacc counterparts. But I wasn't thinking of writing a compiler from scratch for OCaml. It would a nightmare job. Besides, OCaml already provides tool for parsing and compiling OCaml code. My question in (2) was on how best to use those tools for this purpose. – Surikator Feb 19 '11 at 23:46

2 Answers2

3

Regarding 2), OCaml installations typically have a compiler-lib directory containing many of the modules making up the compiler, including the parser. This is not documented, but may be usable to do the compilation and type-checking. For the parsing, there is also Camlp4, which can be used to do things besides preprocessing (I would imagine you can load Camlp4 into your program and use it as a library, possibly after patching it a bit), but that only contains the grammar and not the typechecker.

Michael Ekstrand
  • 28,379
  • 9
  • 61
  • 93
  • 1
    Note that this will be less portable than using toplevellib. OCaml's default install procedure doesn't install any `compiler-lib` directory. – Daniel Bünzli Feb 20 '11 at 00:33
  • I have Camlp5, but I guess it should be the same for this purposes, no? In any case, I'd really need the whole compiler. I need to make sure that whatever I'm checking would actually compile under `ocamlc`, not just parse. Thanks for the pointer to the compiler directory, that was of great help. I only have .cm* and .o files in there, though, so I can't see the contents of the compiler modules. I have found the `Toploop.execute_phrase` function suggested by Daniel Bunzli so I'll give it a try for now. Thanks! – Surikator Feb 20 '11 at 00:44
  • I think I'll give camlp5 a try. `Toploop.exectute_phrase` is undocumented and turned out not very easy to figure out. My hack of writing to a file, compiling and reading stderr works fine but is too slow for intensive use, which I require. So, I'll use camlp5 as a filter: only those which parse in camlp5 will be tested through ocamlc. – Surikator Feb 20 '11 at 03:25
2

Regarding 1) I think you did something wrong. On my machine :

> ocaml unix.cma
        Objective Caml version 3.12.0

# let ic = Unix.open_process_in "ocamlc test.ml 2>&1";;
val ic : in_channel = <abstr>
# input_line ic;;
- : string = "File \"test.ml\", line 1, characters 0-1:"
# input_line ic;;
- : string = "Error: I/O error: test.ml: No such file or directory"
# input_line ic;;
Exception: End_of_file.

For Unix.create_process one way of doing is to pass Unix.stdin for new_stdout so that you can read the output of the process via the standard input of your program (this assumes you don't want to use the stdin of your program for something else). But a simpler way me be to use Unix.open_process_full.

Regarding 2) you can try to use toplevellib.cma (note however that this is completely undocumented and unsupported). Have a look at toplevel/toploop.mli in the distribution, in particular Toploop.execute_phrase.

Daniel Bünzli
  • 5,119
  • 20
  • 21
  • Interesting... I've just gone back to the code. The `Unix.open_process_in` bit was working just fine. The problem was somewhere else, a very silly mistake with the base case for the recursive function processing the lines of the stdout. Sorry to waste your time. Still, it was the fact that I saw it working for you that triggered my attention to a different part of the code. The suggestion regarding (2) is precious. I'll try to put `Toploop.execute_phrase` to work. Thanks a lot! – Surikator Feb 20 '11 at 00:37
  • Yes, `Unix.open_process_full` seems more appropriate. I should have seen it; I looked at that part of the Unix module documentation but I missed it. By using the `_full` version I may avoid the `2>&1` thing which is a bit hacky. Thanks! – Surikator Feb 20 '11 at 00:42