Well in Haskell there are 2 main ways of parsing something, parse combinators or a parser generator. Since you already have a BNF I'd suggest the latter.
A good one is alex. GHC's parser IIRC is written using this so you'd be in good company.
Next you'll have a big honking stack of data declarations to parse into:
data JavaClass = {
className :: Name,
interfaces :: [Name],
contents :: [ClassContents],
...
}
data ClassContents = M Method
| F Field
| IC InnerClass
and for expressions and whatever else you need. Finally you'll combine these into something like
data TopLevel = JC JavaClass
| WhateverOtherForms
| YouWillParse
Once you have this you'll have the entire AST represented as one TopLevel
or a list of them depending on how many you classes/files you parse.
To proceed from here depends on what you want to do. There are a number of libraries such as syb
(scrap your boilerplate) that let you write very concise tree traversals and modifications. lens
is also an option. At a minimum check out Data.Traversable
and Data.Foldable
.
To modify the tree, you can do something as simple as
ignoreInnerClasses :: JavaClass -> JavaClass
ignoreInnerContents c = c{contents = filter isClass $ contents c}
-- ^^^ that is called a record update
where isClass (IC _) = True
isClass _ = False
and then you could potentially use something like syb
to write
everywhere (mkT ignoreInnerClass) toplevel
which will traverse everything and apply ignoreInnerClass
to all JavaClasses
. This is possible to do in lens
and many other libraries too, but syb
is very easy to read.