What kind of data format is this?

Question

I have a bunch of data files of the following form:

("String"
    :tag1 (value)
    :tag2 (value2)
    :tag3 (
        :nested_tag1 (foo)
        :nested_tag2 (
            :nested2_tag1 (
                 : ( nested3_tag1
                          :baz (true)
                          :qux ("a really long block of text")

                 )
            )
        )
    )
)

This is just a small example. The real files have many thousands of lines.

Forgive my ignorance, but I don't recognise the format. Is this a common or known format? Does it have a name?

I want to process it with Perl and wondered if there were any external modules that would allow me to easily turn it into a Perl data structure - without me having to write a Parser myself. After all, why re-invent the wheel! ;-)

+1 for the noble aim of reusing code and not typing without thinking. — Daniel Böhmer, Apr 14 '11 at 14:52
It's a ruleset for an Intrusion detection device. I need to extract certain bits of info from it for later Perl-based fun. I guess it might be proprietary, but thought I'd ask if anyone recognised it anyway! Being inherently lazy, I wanted someone else to have done the work for me first! hehe — wawawawa, Apr 14 '11 at 15:18
If the format is strictly as you show, you may be able to transform it easily into a more parseable format (such as XML, JSON or YAML) using simple text-based search-and-replace. — Gintautas Miliauskas, Apr 14 '11 at 15:22
Actually, converting to another format is an idea I hadn't considered. Hmmmm.... — wawawawa, Apr 14 '11 at 15:29

score 7 · Accepted Answer · answered Apr 14 '11 at 15:20

7

To me this looks like a Lisp-ish S-Expression. Emacs, for example, will understand your example just fine after quoting it as a list.

S-Expressions are generally very easy to parse, but also a CPAN search for S-Expressions should give you enough to not have to write a parser yourself.

answered Apr 14 '11 at 15:20

rafl

11,980
2
55
77

Oohh.. Thanks for this. I'll have a look and report back. Much appreciated! – wawawawa Apr 14 '11 at 15:29
Hmmm... Data::SExpression seg faults when I try to parse the whole cannoli. – wawawawa Apr 14 '11 at 16:20
There's a couple of other modules that should be able to do the job as well. Try some of those? But also make sure to check `Data::SExpression`'s bug tracker. Maybe it's a known problem with a patch available already. If not, make sure to at least let the author know about it. – rafl Apr 14 '11 at 16:34
Yes - good call. I'll do that. I'm now doing the unthinkable and using Perl's flip-flop operator. As the source data is machine generated, I can be sure of the whitespace (well, sure enough...). So I use: `while (<>) { next unless /^(\t+):\d+ $".+"$/ .. /^$1$$/; #do stuff ; }`... Am I going to hell for not writing a proper parser? – wawawawa Apr 14 '11 at 16:59

What kind of data format is this?

1 Answers1