I have a project which contains a bunch of small programs tied together using bash scripts, as per the Unix philosophy. Their exchange format originally looked like this:
meta1a:meta1b:meta1c AST1
meta2a:meta2b:meta2c AST2
Where the :
-separated fields are metadata and the AST
s are s-expressions which the scripts pass along as-is. This worked fine, as I could use cut -d ' '
to split the metadata from the ASTs, and cut -d ':'
to dig into the metadata. However, I then needed to add a metadata field containing spaces, which breaks this format. Since no field uses tabs, I switched to the following:
meta1a:meta1b:meta1c:meta 1 d\tAST1
meta2a:meta2b:meta2c:meta 2 d\tAST2
Since I envision more metadata fields being added in the future, I think it's time to switch to a more structured format rather than playing a game of "guess the punctuation".
Instead of delimiters and cut
I could use JSON and jq
, or I could use XML and xsltproc
, but since I'm already using s-expressions for the ASTs, I'm wondering if there's a nice way to use them here instead?
For example, something which looks like this:
(echo '(("foo1" "bar1" "baz1" "quux 1") ast1)'
echo '(("foo2" "bar2" "baz2" "quux 2") ast2)') | sexpr 'caar'
"foo1"
"foo2"
My requirements are:
- Straightforward use of stdio with minimal boilerplate, since that's where my programs read/write their data
- Easily callable from shell scripts or provide a very compelling alternative to bash's process invocation and pipelining
- Streaming I/O if possible; ie. I'd rather work with one AST at a time rather than consuming the whole input looking for a closing
)
- Fast and lightweight, especially if it's being invoked a few times; each AST is only a few KB, but they can add up to hundreds of MB
- Should work on Linux at least; cross-platform would be nice
The obvious choice is to use a Lisp/Scheme interpreter, but the only one I'm experienced with is Emacs, which is far too heavyweight. Perhaps another implementation is more lightweight and suited to this?
In Haskell I've played with shelly, turtle and atto-lisp, but most of my code was spent converting between String/Text/ByteString, wrapping/unwrapping Lisp
s, implementing my own car
, cdr
, cons
, etc.
I've read a little about scsh, but don't know if that would be appropriate either.