4

I have a question that's been bothering me for some time. Is the Common Lisp format function reversible (at least to some degree) in that the format string could be used to retrieve original arguments from format's output? I am aware that the mapping is not one-to-one (one example is the ~@:(~a~) which turns input to uppercase and is not reversible), so necessarily some information is lost. What I have in mind exactly is rather an alternative to regular expressions for string parsing. For example, I would like to be able to write:

(destructure-format t "[~{~a~^, ~}]" "[0, 1, 2]")

and get the response:

=> (0 1 2)

Are you aware of any such attempts or papers discussing a similar approach?

Wojciech Gac
  • 1,538
  • 1
  • 16
  • 30
  • 1
    Hu. What about a combination of [`destructuring-bind`](http://www.lispworks.com/documentation/HyperSpec/Body/m_destru.htm#destructuring-bind) and [`read-from-string`](http://www.lispworks.com/documentation/HyperSpec/Body/f_rd_fro.htm)? If you need to serialize data, that's what S-expressions are for. There is nothing in the core language, which would allow for the kind of string-pattern-matching you are alluding to here, AFAIK. You can, of course, use [regular expressions](http://weitz.de/cl-ppcre/). – Dirk May 06 '14 at 09:10
  • Yeah, now that I look at it, it looks like a reasonable thing to do. Thanks! – Wojciech Gac May 06 '14 at 09:29
  • 1
    http://jcsu.jesus.cam.ac.uk/~csr21/format-setf.lisp - but it's kind of a joke/hack. – Xach May 06 '14 at 12:52
  • Thanks, @Xach. I'll definitely give it a look. – Wojciech Gac May 06 '14 at 13:49

2 Answers2

4

Nothing in the standard

There's nothing like this in the standard. Format expressions don't carry enough information to make this useful in any real sense. For just about everything that doesn't bind *print-readably*, there are ways in which the output would be hard to read back. In the case that you gave, with a list formatting,

(destructure-format t "[~{~a~^, ~}]" "[0, 1, 2]")

any solution would have to examine the format directive. What could it then unambiguously observe? The first character in the string must be a #\[, and the last must be #\], and that some occurrences of ", " within the string separate output generated by ~a. What ambiguities could arise, then? Anything that would cause a ", " to be written in the output. E.g.,

CL-USER> (format t "[~{~a~^, ~}]" '(|, | 2 3))
[, , 2, 3]
NIL
CL-USER> (format t "[~{~a~^, ~}]" '(|, | | ,|))
[, ,  ,]
NIL
CL-USER> (format t "[~{~a~^, ~}]" '(|, | | ,| |,|))
[, ,  ,, ,]
NIL
CL-USER> (format t "[~{~a~^, ~}]" '(|, | | ,| #\,))
[, ,  ,, ,]
NIL

Third party libraries

Although library recommendations are off-topic on Stack Overflow, this question didn't start as one, but after seeing Rörd's answer that suggested using a foreign function call to C's scanf, I quickly searched for scanf on the CLiki and found format-setf (and rereading the comments, I see that Xach found it first), the description of which reads:

The Common Lisp equivalent of scanf().

A (relatively) frequently asked question on comp.lang.lisp is "What's the equivalent to scanf()?". The usual answer is "There isn't one, because it's too hard to work out what should happen". Which is fair enough.

However, one year Christophe was bored during exams, so he wrote format-setf.lisp, which may do what you want.

It should be pointed out that currently the behaviour of this program is unspecified, in more senses than just the clobbering of symbols in the "CL" package. What would be nice would be to see a specification appear for its behaviour, so that I don't have an excuse when people say that it's buggy.

Other alternatives

Since you'd really end up asking "What are the possible ways that this could match, you'd essentially be asking for a regular expression plus the extra things that format makes possible.

What I have in mind exactly is rather an alternative to regular expressions for string parsing.

If you're looking for regular expressions, then regular expressions are a great fit. If you're looking for parsing that's not regular expressions, then you probably want to write a genuine parser. It can be daunting the first time, but after that, it gets much easier, and Common Lisp makes it relatively painless. There are even parser generation libraries available. If, on the other hand, you're looking for serialization and de-serialization, the Common Lisp reader and writer make s-expressions a nice and easy choice.

Community
  • 1
  • 1
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Thanks, @Joshua. I guess I'm gravitating towards writing a parser sooner or later :). Do you know of any good and concise guide on the topic? – Wojciech Gac May 06 '14 at 11:14
  • @WojciechGac Stack Overflow isn't a great place for library/tool recommendations, but you might have a look at the [CLiki page on parser generators](http://www.cliki.net/parser%20generator) for some practical tools. As for the theory and techniques, many people like [The Dragon Book](http://en.wikipedia.org/wiki/Dragon_Book). – Joshua Taylor May 06 '14 at 16:07
1

If you want format string based parsing, but don't need the advanced features of format, you could use C's scanf function through a FFI. Here's an example of doing this with CFFI:

(with-foreign-strings ((input "[0, 1, 2]") (format "[%d, %d, %d]"))
  (with-foreign-objects ((a :int) (b :int) (c :int))
    (foreign-funcall "sscanf" :pointer input :pointer format
                     :pointer a :pointer b :pointer c)
    (loop for x in (list a b c) collect (mem-ref x :int))))
Rörd
  • 6,556
  • 21
  • 27