parsing keyed lists from a file in tcl?

Question

I have a file full of records in the following format:

{TOKEN 
    { NAME {name of this token} }
    { GROUPS {Group 1} }
    { VALUE value }
    { REPEATING {
        { MAX 3 }
        { TIME {nmin 30} }
    } }
    { WINDOW */*/*/* }
    { ACTION {
        { EXEC {code to run here} }
    } }
}
{TOKEN 
    { NAME {name of next token} }
    { GROUPS {Group 1} }
    { VALUE value }
    { WINDOW 0/0:30-2:00,3:30-7:30/*/* }
    { HOST {localhost} }
    { ACTION {
        { email {
            { FROM cloverleaf@healthvision.com }
            { TO me@xxxx.org }
            { SUBJ {email subject test} }
            { MSG {this is the email body} }
        } }
    } }

Not all of the records have the same keywords but they all are nested keyed lists and I need to parse them into a .csv file for easier review. However, when I read in the file, it comes in as a single string rather than as a list of keyed lists. Splitting on whitespace or newline wouldn't help because they are located inside the keyed lists too. I tried to insert a pipe (|) between }\n and {T and split on the pipe but I still ended up with strings.

I hope someone can point me in the right direction to parse these s-expression files.

thanks in advance!

J

What do you want the output to look like? – Hai Vu Dec 25 '13 at 06:48 — Hai Vu, Dec 25 '13 at 06:48

Hai Vu · Answer 1 · 2013-12-26T18:23:55.917

The Problem

Here is how I understand your problem.

You have a text file full of records. Each record is {TOKEN ...}
Each record is almost a keyed list, but not quite: the string TOKEN makes it an invalid keyed list. If we remove this string, then the rest will be a valid keyed list.
Each keyed list might be nested. That is, the value might be another keyed list.
You want to write each record as a row in a CSV file. However, in a CSV file, each row should contain the same number of columns, which is not the case here. I will leave it for you to find out how to best deal with it.

The Solution

What I suggest is to turn this into a dictionary, which is a flat, not nested, structure. That should make the job easier. Once you have a flat list, dealing with it becomes easier. Here is my solution:

# myscript.tcl

package require Tclx

proc makeKey {prefix key} {
    return [string trim "$prefix $key"]
}   

proc keyedlist2dict {klname {keyPrefix ""}} {
    upvar 1 $klname kl
    set d {}
    foreach key [keylkeys kl] {
        set value [keylget kl $key]
        if {[catch {keylkeys value}]} {
            # value is not a nested keyed list
            lappend d [makeKey $keyPrefix $key] $value
        } else {
            # value is a nested keyed list
            set d [concat $d [keyedlist2dict value $key]] ;# TCL 8.4
        }   
    }   

    return $d
}   

set contents [read [open data.txt]]
foreach item $contents { 
    # Each item starts with "TOKEN", which we need to remove otherwise
    # the keyed list is invalid
    set item [lrange $item 1 end]

    # Convert a keyed list to a dict, then to a csv row. We can then 
    # display the row or to write it to a file.
    set rec [keyedlist2dict item]

    # Display it
    foreach {key value} $rec { ;# TCL 8.4
        puts "$key: $value"
    }   
    puts ""
}

Run the Script

tclsh myscript.tcl

Output

NAME: name of this token
GROUPS: Group 1
VALUE: value
REPEATING MAX: 3
REPEATING TIME: nmin 30
WINDOW: */*/*/*
ACTION EXEC: code to run here

NAME: name of next token
GROUPS: Group 1
VALUE: value
WINDOW: 0/0:30-2:00,3:30-7:30/*/*
HOST: localhost
email FROM: cloverleaf@healthvision.com
email TO: hardej@mmc.org
email SUBJ: email subject test
email MSG: this is the email body

Discussion

I assume your data is data.txt
The workhorse here is keyedlist2dict, where I take a keyed list and flatten it out to become a dictionary.
- In this procedure, if the value is not a nested keyed list, I just append the key and values to the dictionary
- If the value is indeed a nested keyed list, then I recursively call keyedlist2dict
- Take a look at the output and you will see how I form the new keys
This script requires TCL version 8.5 or later

Update

I made changes to the two lines which I marked TCL 8.4. The script should now work on TCL 8.4 system.

I apologize. I should have specified that one of my limitations is TCL 8.4. — jenny lynne, Dec 26 '13 at 14:14
Is TOKEN invalid because it's a reserved word? That can be changed. I assume it is still in the correct format for a keyed list. key + value where value is the nested keyed list? The issue isn't how to get it to csv. It's to get it from file to list of keyed lists. — jenny lynne, Dec 26 '13 at 14:23
Jenny: please see my updated section, in which I answered both of your question/problem. — Hai Vu, Dec 26 '13 at 18:37
So, @Hai Vu, here is my ongoing issue: set contents [read [open data.txt]] foreach item $contents { # Each item starts with "TOKEN", which we need to remove otherwise # the keyed list is invalid set item [lrange $item 1 end] When my code reads in from the data.txt file, contents reads in as one long string rather than as a list. The foreach only occurs one time for the entire contents of the file, though there are 34 "records"/items. Can't split on spaces or newlines because they are interspersed throughout & there is no \n\r between the "records"/items. — jenny lynne, Dec 26 '13 at 19:34
I remember the `Tclx` package has something to deal with reading list off a file. I'm looking up now. — Hai Vu, Dec 26 '13 at 19:56
That command is `lgets`. I need some sample data to work with. Would you please update your original post with some sample data? If they are sensitive, just make up something. — Hai Vu, Dec 26 '13 at 20:06

score 1 · Accepted Answer · edited Feb 15 '16 at 23:44

1

That looks like a list of TclX keyed lists, which were an earlier attempt to do what modern Tcl does with dictionaries. Keyed lists nest quite nicely — that's a tree, not a table — so mapping to CSV will not be maximally efficient, but their syntax is such that the easiest way to handle them is with the TclX code.

Preliminaries:

package require TclX
package require csv;        # From Tcllib

List the columns that we're going to be interested in. Note the . separating bits of names.

set columns {
    TOKEN.NAME TOKEN.GROUPS TOKEN.VALUE TOKEN.REPEATING.MAX TOKEN.REPEATING.TIME
    TOKEN.WINDOW TOKEN.HOST TOKEN.ACTION.EXEC TOKEN.ACTION.email.FROM
    TOKEN.ACTION.email.TO TOKEN.ACTION.email.SUBJ TOKEN.ACTION.email.MSG
}
# Optionally, put a header row in:
puts [csv::join $columns]

Loading the real data into Tcl:

set f [open "thefile.dta"]
set data [read $f]
close $f

Iterate over the lists, extract the info, and send to stdout as CSV:

foreach item $data {
    # Ugly hack to munge data into real TclX format
    set item [list [list [lindex $item 0] [lrange $item 1 end]]]
    set row {}
    foreach label $columns {
        if {![keylget item $label value]} {set value ""}
        lappend row $value
    }
    puts [csv::join $row]
}

Or something like that.

edited Feb 15 '16 at 23:44

SulfoCyaNate

396
1
2
19

answered Dec 25 '13 at 16:50

Donal Fellows

133,037
18
149
215

I apologize. I should have specified that one of my limitations is TCL 8.4. The issue isn't how to get it to csv. It's to get it from file to list of keyed lists. – jenny lynne Dec 26 '13 at 14:25
Sticking with 8.4 is going to make your life progressively harder; it's out of support now. (That code *does* work in 8.4 though; the first version did't, but I removed the dictionaries I was tinkering with before posting. Both Tclx and csv packages are available for it and I've now confirmed by testing.) – Donal Fellows Dec 26 '13 at 17:41
I'll have to re-examine this when I get back to my desk in the morning, but I was definitely not having success with keylget on Tuesday. My "data" is coming out as one long string and the braces are being interpretted as simply braces rather than grouping the keyed lists. I feel like I'm banging my head against a wall. And, yes, we should be upgrading sometime next year, but I can't wait. – jenny lynne Dec 27 '13 at 02:35
Yeah; you need that fancy hack line to make it work. It turns the elements of the overall list into genuine keyed lists (or at least it does with the example you supplied). If you've no other constraints, I'd strong suggest keeping such things in a SQLite DB instead; that's *much* better at persisting data… – Donal Fellows Dec 27 '13 at 13:03
I had to add a catch to the keylget for those occassions when the key isn't in the column list, but after some fiddling, it worked like a charm. Thanks a lot! At first I did have some weirdness with one field that sometimes has a nested keyed list and most times doesn't. -- ie. {WEIRD_FIELD value} or {WEIRD_FIELD { { { {LABEL1 val1} {LABEL2 val2} } {LABEL1 val3} {LABEL3 val4} {LABEL3 val5} } } -- In the end, I couldn't get the sublabels to separate out and the whole chunk kept storing in the WEIRD_FIELD value for that row, so I figured it was easier to see all together since it was rare. – jenny lynne Dec 27 '13 at 18:42

score 1 · Answer 3 · answered Mar 27 '14 at 15:04

I realize this is a few months old at this point, but I see that you're trying to parse Cloverleaf config files (which is how I stumbled on this myself).

For anyone else trying to do something similar, there are actually libraries available for handling this provided with Cloverleaf, though they're not mentioned anywhere in the documentation.

Check out $HCIROOT/tcl/lib/cloverleaf. Handling for alert configs looks like it's in configIO.tlib. NetConfig stuff is in nci.tlib and netData.tlib.

what version of CL are you using? – jenny lynne Aug 26 '14 at 15:34 — jenny lynne, Aug 26 '14 at 15:34

score 0 · Answer 4 · answered Dec 27 '13 at 16:35

You could treat the data as plain lists and read it line-by-line. The info complete command helps here:

set fh [open your.file r]
while {[gets $fh line] != -1} {
    append kl $line
    if {[info complete $kl]} {
        lappend lists $kl
        set kl ""
    }
}
close $fh
puts [llength $lists]                ;# 2
puts [llength [lindex $lists 0]]     ;# 1
puts [llength [lindex $lists 0 0]]   ;# 7
puts $lists

{{TOKEN { NAME {name of this token} } { GROUPS {Group 1} } { VALUE value } { REPEATING { { MAX 3 } { TIME {nmin 30} } } } { WINDOW //*/* } { ACTION { { EXEC {code to run here} } } }}} {{TOKEN { NAME {name of next token} } { GROUPS {Group 1} } { VALUE value } { WINDOW 0/0:30-2:00,3:30-7:30// } { HOST {localhost} } { ACTION { { email { { FROM cloverleaf@healthvision.com } { TO me@xxxx.org } { SUBJ {email subject test} } { MSG {this is the email body} } } } } }}}