6

I have a bunch of data files of the following form:

("String"
    :tag1 (value)
    :tag2 (value2)
    :tag3 (
        :nested_tag1 (foo)
        :nested_tag2 (
            :nested2_tag1 (
                 : ( nested3_tag1
                          :baz (true)
                          :qux ("a really long block of text")

                 )
            )
        )
    )
)

This is just a small example. The real files have many thousands of lines.

Forgive my ignorance, but I don't recognise the format. Is this a common or known format? Does it have a name?

I want to process it with Perl and wondered if there were any external modules that would allow me to easily turn it into a Perl data structure - without me having to write a Parser myself. After all, why re-invent the wheel! ;-)

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
wawawawa
  • 431
  • 3
  • 12
  • +1 for the noble aim of reusing code and not typing without thinking. – Daniel Böhmer Apr 14 '11 at 14:52
  • It's a ruleset for an Intrusion detection device. I need to extract certain bits of info from it for later Perl-based fun. I guess it might be proprietary, but thought I'd ask if anyone recognised it anyway! Being inherently lazy, I wanted someone else to have done the work for me first! hehe – wawawawa Apr 14 '11 at 15:18
  • 1
    If the format is strictly as you show, you may be able to transform it easily into a more parseable format (such as XML, JSON or YAML) using simple text-based search-and-replace. – Gintautas Miliauskas Apr 14 '11 at 15:22
  • Actually, converting to another format is an idea I hadn't considered. Hmmmm.... – wawawawa Apr 14 '11 at 15:29

1 Answers1

7

To me this looks like a Lisp-ish S-Expression. Emacs, for example, will understand your example just fine after quoting it as a list.

S-Expressions are generally very easy to parse, but also a CPAN search for S-Expressions should give you enough to not have to write a parser yourself.

rafl
  • 11,980
  • 2
  • 55
  • 77
  • Oohh.. Thanks for this. I'll have a look and report back. Much appreciated! – wawawawa Apr 14 '11 at 15:29
  • Hmmm... Data::SExpression seg faults when I try to parse the whole cannoli. – wawawawa Apr 14 '11 at 16:20
  • There's a couple of other modules that should be able to do the job as well. Try some of those? But also make sure to check `Data::SExpression`'s bug tracker. Maybe it's a known problem with a patch available already. If not, make sure to at least let the author know about it. – rafl Apr 14 '11 at 16:34
  • Yes - good call. I'll do that. I'm now doing the unthinkable and using Perl's flip-flop operator. As the source data is machine generated, I can be sure of the whitespace (well, sure enough...). So I use: `while (<>) { next unless /^(\t+):\d+ \(".+"$/ .. /^$1\)$/; #do stuff ; }`... Am I going to hell for not writing a proper parser? – wawawawa Apr 14 '11 at 16:59