I have a set of picocli-based applications that I'd like to parse the usage output into structured data. I've written three different output parsers so far and I'm not happy with any of them (fragility, complexity, difficulty in extending, etc.). Any thoughts on how to cleanly parse this type of semi-structured output?
The usage output generally looks like this:
Usage: taker-mvo-2 [-hV] [-C=file] [-E=file] [-p=payoffs] [-s=millis] PENALTY
(ASSET SPREAD)...
Submits liquidity-taking orders based on mean-variance optimization of multiple
assets.
PENALTY risk penalty for payoff variance
(ASSET SPREAD)... Spread for creating market above fundamental value
for assets
-C, --credential=file credential file
-E, --endpoint=file marketplace endpoint file
-h, --help display this help message
-p, --payoffs=payoffs payoff states and probabilities (default: .fm/payoffs)
-s, --sleep=millis sleep milliseconds before acting (default: 2000)
-V, --version print product version and exit
I want to capture the program name and description, options, parameters, and parameter-groups along with their descriptions into an agent
:
public class Agent {
private String name;
private String description = "";
private List<Option> options;
private List<Parameter> parameters;
private List<ParameterGroup> parameterGroups;
}
The program name is taker-mvo-2
and the (possibly multi-lined) description is after the (possibly multi-line) arguments list:
Submits liquidity-taking orders based on mean-variance optimization of multiple assets.
Options (in square brackets) should be parsed into:
public class Option {
private String shortName;
private String parameter;
private String longName;
private String description;
}
The parsed options' JSON is:
options: [ {
"shortName": "h",
"parameter": null,
"longName": "help",
"description": "display this help message"
}, {
"shortName": "V",
"parameter": null,
"longName": "version",
"description": "print product version and exit"
}, {
"shortName": "C",
"parameter": file,
"longName": "credential",
"description": "credential file"
}, {
"shortName": "E",
"parameter": file,
"longName": "endpoint",
"description": "marketplace endpoint file"
}, {
"shortName": "p",
"parameter": payoffs,
"longName": "payoffs",
"description": "payoff states and probabilities (default: ~/.fm/payoffs)"
}]
Similarly for the parameters which should be parsed into:
public class Parameter {
private String name;
private String description;
}
and parameter-groups which are surrounded by (
and )...
should be parsed into:
public class ParameterGroup {
private List<String> parameters;
private String description;
}
The first hand-written parser I wrote walked the buffer, capturing the data as it progresses. It works pretty well, but it looks horrible. And it's horrible to extend. The second hand-written parser uses regex expressions while walking the buffer. Better looking than the first but still ugly and difficult to extend. The third parser uses regex expressions. Probably the best looking of the bunch but still ugly and unmanageable.
I thought this text would be pretty simple to parse manually but now I'm wondering if ANTLR might be a better tool for this. Any thoughts or alternative ideas?