The most natural thing is to use the Tree
data-type from the ParseTree
module in the standard library. It is the format that the parser produces, but you can also use it yourself. To get a string from the tree, simply print it in a string like so:
str s = "<myTree>";
A relatively complete random tree generator can be found here: https://github.com/cwi-swat/drambiguity/blob/master/src/GenerateTrees.rsc
The core of the implementation is this:
Tree randomChar(range(int min, int max)) = char(arbInt(max + 1 - min) + min);
Tree randomTree(type[Tree] gr)
= randomTree(gr.symbol, 0, toMap({ <s, p> | s <- gr.definitions, /Production p:prod(_,_,_) <- gr.definitions[s]}));
Tree randomTree(\char-class(list[CharRange] ranges), int rec, map[Symbol, set[Production]] _)
= randomChar(ranges[arbInt(size(ranges))]);
default Tree randomTree(Symbol sort, int rec, map[Symbol, set[Production]] gr) {
p = randomAlt(sort, gr[sort], rec);
return appl(p, [randomTree(delabel(s), rec + 1, gr) | s <- p.symbols]);
}
default Production randomAlt(Symbol sort, set[Production] alts, int rec) {
int w(Production p) = rec > 100 ? p.weight * p.weight : p.weight;
int total(set[Production] ps) = (1 | it + w(p) | Production p <- ps);
r = arbInt(total(alts));
count = 0;
for (Production p <- alts) {
count += w(p);
if (count >= r) {
return p;
}
}
throw "could not select a production for <sort> from <alts>";
}
Tree randomChar(range(int min, int max)) = char(arbInt(max + 1 - min) + min);
It is a simple recursive function which randomly selects productions from a reified grammar.
The trick towards termination lies in the weight
of each rule. This is computed a priori, such that every rule has its own weight in the random selection. We take care to give the set of rules that lead to termination at least 50% chance of being selected (as opposed to the recursive rules) (code here: https://github.com/cwi-swat/drambiguity/blob/master/src/Termination.rsc)
Grammar terminationWeights(Grammar g) {
deps = dependencies(g.rules);
weights = ();
recProds = {p | /p:prod(s,[*_,t,*_],_) := g, <delabel(t), delabel(s)> in deps};
for (nt <- g.rules) {
prods = {p | /p:prod(_,_,_) := g.rules[nt]};
count = size(prods);
recCount = size(prods & recProds);
notRecCount = size(prods - recProds);
// at least 50% of the weight should go to non-recursive rules if they exist
notRecWeight = notRecCount != 0 ? (count * 10) / (2 * notRecCount) : 0;
recWeight = recCount != 0 ? (count * 10) / (2 * recCount) : 0;
weights += (p : p in recProds ? recWeight : notRecWeight | p <- prods);
}
return visit (g) {
case p:prod(_, _, _) => p[weight=weights[p]]
}
}
@memo
rel[Symbol,Symbol] dependencies(map[Symbol, Production] gr)
= {<delabel(from),delabel(to)> | /prod(Symbol from,[_*,Symbol to,_*],_) := gr}+;
Note that this randomTree algorithm will not terminate on grammars that are not "productive" (i.e. they have only a rule like syntax E = E;
Also it can generate trees that are filtered by disambiguation rules. So you can check this by running the parser on a generated string and check for parse errors. Also it can generated ambiguous strings.
By the way, this code was inspired by the PhD thesis of Naveneetha Vasudevan of King's College, London.