1

I’ve been dealing with a design problem for quite a while now, where cyclic dependencies are the fundamental problem, and I’m having some problems resolving it elegantly. I’m coming from C, where cyclic dependencies are both possible and quite easily resolvable.

The following is a very simplified image of the files in the project which are of interest:

ast.ml (doesn’t actually have an interface, I’m not too keen on copying the whole type)

type loc = string * (int * int) * (int * int)
and id = string * loc
and decl = 
  | Decl_Func of decl_func
and decl_func = {
  df_Name: id;
  mutable df_SymTab: sym_tab option;
}
(* goes on for about 100 more types *)

symtab.mli

type t
type symbol =
  | Sym_Func of Ast.decl_func

val lookup_by_id: Ast.id -> symbol

(there are more files to be added in the future)

In C I'd simply make the symbol table a pointer, and forward declare it. Problem solved. This, unfortunately, isn't possible in OCaml.

Each of the implementations is quite large. Which means I absolutely do not want to make everything recursive modules, since that would mean the implementation file will be 10kloc or even more, with a ton of code which is not really related (beyond the big recursive type).

How would I solve this, while still maintaining a somewhat modular design?

Leandros
  • 16,805
  • 9
  • 69
  • 108
  • You can alias types: `type t = T.a` in `a.ml` for example. – glennsl Apr 17 '19 at 12:21
  • I also wouldn't consider interdependent modules "modular design". Modularity to me implies some degree of independence between modules, but perhaps you mean something else by it? – glennsl Apr 17 '19 at 12:21
  • Yes, it may not be particular "modular". In this context, modular more or less means avoiding one large 20kloc module, but many smaller modules. This is likely to be impossible to build completely modular, since it's essentially interconnected by design. – Leandros Apr 17 '19 at 12:33
  • @PatJ I've edited with some excerpt from the actual code. – Leandros Apr 17 '19 at 12:36
  • Still nitpicking, but what keeps you from moving the definition of `symbol` to *ast.ml*? – PatJ Apr 17 '19 at 12:41
  • @PatJ _ast.ml_ on it's own is already over a thousand lines. And just moving over `symbol` is not enough, since the lookup happens with `id`, which is an AST node. – Leandros Apr 17 '19 at 12:42

1 Answers1

1

You're not the first to have that problem and there are numerous different solutions depending on workflow, taste and needs.

Here is a good way to think about it.

1. Isolate the leaves of your AST

By leaves, I mean the types like loc or id that do not depend on any other type. They don't need to be in your recursive type definition and therefore shouldn't be.

Moreover, you'll probably have specific functions to handle locations and identifiers and having those function close to the type definition is good practice. So, you can create a ast_loc.ml and a ast_id.ml file with the appropriate definitions and basic functions.

This may seem like little, but it will actually help make your code clearer with the added bonus of lightening up ast.ml.

2. If need be, parameterize your types

Now, I do not recommend you use that extensively, as it tends to make code harder to read, as it has more indirections. Check this out:

type 't v = Thing of 't

(* potentially in a different later file *)
type t = Stuff of t v

By using a type parameter, you can delay the usage of recursivity in your type definition. Note that I do not recommend you use it for your whole AST as it will make maintaining a pain but if you have some middle nodes that behave quite independently of the rest, this may help.

These for instance, can be often used:

type 'a named = { id : id; v : 'a; }
type 'a located = { loc : loc; v: 'a; }

This method is particularly useful if it helps factorize your type definition. But, as I have already stated: don't abuse it! It is easy to do, but hard to maintain.

3. At some point, a big fat recursive definition is what you need

As of today, the Parsetree file of the OCaml compiler has 958 lines. That's what it's supposed to have. It is a complex tree structure and that should be visible.

Note that the file is just a type definition. Subsequent files contain the code to manipulate that definition (and usually don't introduce new types that are necessary outside their module).

In a way, I am a bit contradicting the point I made about loc and id arguing that you should separate type definition and code, but this is a different case: loc and id are simple types that can be manipulated independently. symbol only makes sense within your AST definition. Also, nothing keeps you from creating a symbol.ml file that manipulates that part of the AST without containing the type definition (comments are your friends, Merlin is a must).

Also, recursive functors is not something I'd advise unless you really need them.

Community
  • 1
  • 1
PatJ
  • 5,996
  • 1
  • 31
  • 37
  • Thanks, I went with a big fat recursive definition (sadly, including recursive modules), with the help of [ppx_include](https://github.com/aexpl/ppx_include), to separate them into different files. I found this to be the cleanest solution. – Leandros Apr 25 '19 at 09:55