1

Is there an OCaml tool that allows filtering comments in source files, similar to gcc -E?

Ideally, I'm looking for something that will remove everything but comments, but the other way around would also be useful.

For instance, if there is a way to use camlp4/campl5/ppx to obtain OCaml comments (including non-OCamldoc comments defined with a single asterisk), I would like to know. I haven't had much success looking for comment nodes in Camlp4's AST (though I know it must exist, because there are even bugs related to the fact that Camlp4 modifies their placement).

Here's an example: in the following file:

(*** three asterisks *)
let f () =
  Format.printf "end"

let () =
  (* one asterisk (* nested comment *) *)
  Printf.printf "hello world\n";
  (** two asterisks *)
  f();
  ()

I'd like to ideally obtain:

(*** three asterisks *)
(* one asterisk (* nested comment *) *)
(** two asterisks *)

The whitespace between them and the presence or absence of (* *) are mostly irrelevant, but it should preserve comments of all kinds. My immediate purpose is to be able to filter it to a spell checker, but cleaning comments (i.e. having a filter that strips comments only) could also be useful: I could clean the comments and then use diff to obtain what has been removed.

anol
  • 8,264
  • 3
  • 34
  • 78

3 Answers3

1

You can use ocamldoc with a custom generator that will dump comments using the textual representation.

ivg
  • 34,431
  • 2
  • 35
  • 63
  • Unfortunately it seems that ocamldoc does not consider single-star comments. If I use `sed` to transform then, it tries too hard to parse them, leading to several kinds of errors. Plus, the fact that it requires some sort of compilation setup (e.g. `-I` paths to allow it to find `.cmi` files, for instance) makes it quite heavy for what I intended. – anol Feb 06 '17 at 07:44
1

I have made some interesting experiments with camlp5, playing along with the idea of pretty-printing "" for any code item. The following code:

let ignore _ _ _ = ""

let rule f = Extfun.(extend f [Evar (),false, fun _ -> Some ignore])

let () =
  Eprinter.extend Pcaml.pr_str_item None [ None, rule ];
  Eprinter.extend Pcaml.pr_sig_item None [ None, rule ]

will disable the pretty printing of any str_item (i.e. toplevel items of module implementation) or sig_item (toplevel items of module interfaces), by extending the corresponding default printer with a catch-all rule that output an empty string for any str_item. Compile pr_comment.ml with

ocamlfind ocamlc -c -package camlp5 pr_comment.ml

and use it as

camlp5o pr_o.cmo path/to/pr_comment.cmo -o only_comment.ml my_file.ml
Virgile
  • 9,724
  • 18
  • 42
  • 1
    That does work for toplevel comments, but unfortunately not for comments inside functions, for instance. It does solve part of the problem but not all of it, so I'm still looking for another solution. – anol Feb 06 '17 at 17:39
  • Yes, I've noticed that. I tried playing a little bit more with camlp5, but documentation is a bit scarce on this topic, and I haven't made much progress. – Virgile Feb 06 '17 at 17:42
0

Well, there is now a lexer based on ocamlwc that strips everything but the comments in the code, called ocaml-comment-sieve. It is based on the simple lexer used in ocamlwc.

However, this tool is GPL-licensed (because it is derived from ocamlwc, which is GPL-licensed), so it cannot be posted here. Still, it does satisfy my requirements, so until someone suggests a better way, I'll consider it as an answer.

Community
  • 1
  • 1
anol
  • 8,264
  • 3
  • 34
  • 78