For reference: this is basically a followup question of Troubles using Bison's recursive rules, and storing values using it.
I am trying to make a flex+bison scanner and parser for Newick file format trees in order to do operations on them. The implemented grammar is based on a simplification of this example.
This is esentially a parser for a file format which represents a tree with a series of (recursive) subtrees and/or leaves. The main tree will always end on ; and said tree and all subtrees within will contain a series of nodes between ( and ), with a name and a length to the right of the rightmost parenthesis specified by name and :length, which are optional (you can avoid specifying them, put one of them (name or :length), or both with name:length).
If any node lacks either the name or a length, default values will be applied. (for example: 'missingName' and '1.0')
An example would be (child1:4, child2:6)root:6; , ((child1Of1:2, child2Of1:9)child1:5, child2:6)root:6;
The implementation of said grammar is the following one (NOTE: I translated my own code, as it was in my language, and lots of side stuff (auxiliary functions, etc.) got removed for clarity...):
EDIT: Now I am not (hopefully) evaluating uninitialized variables.
%union {
struct node {
char* name; /*the node's assigned name, either from the file or from default values*/
float length; /*node's length*/
float lengthInteresting; /*maxium of the distances from this node to a leaf with an interesting string*/
int isInteresting; /*Does this node contain an interesting string? (1=true,0=false)*/
} dataOfNode;
}
%start tree
%token OP CP COMMA SEMICOLON COLON
%token<dataOfNode> DISTANCE NAME
%type<dataOfNode> tree subtrees recursive_subtrees subtree info
%%
tree: subtrees info SEMICOLON {
$$.name = $2.name;
$$.length = $1.length + $2.length;
$$.lengthInteresting = $1.lengthInteresting;
printf("max length is %.2f, max length of those who are interesting is %.2f", $$.length, $$.lengthInteresting);
}
subtrees: OP recursive_subtrees CP {
$$ = $2;
}
;
recursive_subtrees: subtree {
$$ = $1;
} // base case for a list of recursive trees
| recursive_subtrees COMMA subtree {
$$ = $1;
if ($$.length < $3.length) {
$$.length = $3.length;
}
if ($$.lengthInteresting < $3.lengthInteresting) {
$$.lengthInteresting = $3.lengthInteresting;
}
} // subtree, subtree, subtree...
subtree: subtrees info { $$ = $2;
$$.length = $$.length + $1.length;
$$.lengthInteresting = $$.lengthInteresting + $1.lengthInteresting; // if it has any interesting node, $1 will have some length value
}
| info {
$$ = $1;
} // a leaf
info: NAME COLON DISTANCE { $$.name= $$.name; $$.length = $3.length;
if($$.name contains interesting string) { // it cointains an interesting string
$$.isInteresting = 1;
$$.lengthInteresting = $3.length;
} else { // ...it doesn't
$$.isInteresting = 0;
$$.lengthInteresting = 0.0;
}
} // with name and distance
| NAME { $$.name= $1.name; $$.length = 1.0;
if($$.name contains interesting string) { // it cointains an interesting string
$$.isInteresting = 1;
$$.lengthInteresting = 1.0;
} else { // ...it doesn't
$$.isInteresting = 0;
$$.lengthInteresting = 0.0;
}
} // without distance
| COLON DISTANCE { $$.name= "missingName"; $$.length = $2.length;} // without name
| { $$.name= "missingName"; $$.length = 1.0;} // without name nor distance
;
%%
With the assistance I had on my previous question, I have implemented a way of getting the maximum distance/length of the tree's root to a leaf without reconstructing the entire tree's structure. This is done by replacing lenghts recursively when bison is going backwards.
I am trying to do now something similar: get the maximum distance from the root to a leaf which has a specific name I may find interesting (knowing that there may be multiple leaves with said name, I need to do something similar as I did with the maximum length)
However, no matter what I do, I don't get the maximum distance from the root to any of the interesting nodes correctly, whereas the maximum length to any node is displayed correctly. I suspect that my reasoning for getting/replacing/checking the values recursively is still wrong.