Parse list of integers (optimization needed for speed test)

Question

I am performing a tiny speed test in order to compare the speed of the Agda programming language with the Tcl scripting language. Its for scientific work and this is just a pre-test, not a real test. I am not in anyway trying to perform a realistic speed comparison!

I have come up with a small example, in which Agda is 10x times faster than Tcl. There are special reasons I use this example. My main concern is that my Tcl code is badly programmed and this is the sole reason Tcl is slower than Agda in this example.

The goal of the code is to parse a line that represents a list of integers and check if it is indeed a list of integers.
Example "(1,2,3)" would be a valid list.
Example "(1,a,3)" would not be a valid list.

My input is a file and I check every third line (3rd) of the file. If any line is not a list of integers, the program prints "false".

My input file:

(613424,505980,317647,870930,75580,897160,716297,668539,689646,196362,533020)


(727375,472272,22435,869407,320468,80779,302881,240382,196077,635360,568517)


(613424,505980,317647,870930,75580,897160,716297,668539,689646,196362,533020)

(however, my real test file is about 3 megabyte large)

My current Tcl code to solve this problem is:

package require Tcl 8.6

proc checkListNat {str} {
    set list [split [string map {"(" "" ")" ""} $str] ","]
    foreach l $list {
        if {[string is integer $l] == 0} {
            return 0
        }
    }
    return 1
}

set i 1
set fp [open "/tmp/test.txt" r]
while { [gets $fp data] >= 0 } {
    incr i 
    if { [expr $i % 3] == 0} {
        if { [checkListNat $data] == 0 } {
            puts "error"
        }
    }
}
close $fp

How can I optimize my current Tcl code, so that the speed test between Agda and Tcl is more realistic?

I'd use `string range $str 1 end-1` instead of string map. Also, the `if` condition is already an expression, so you only need `if {$i%3 == 0}` without calling expr. — glenn jackman, Jul 28 '13 at 15:04
Hi @Donal, I don't understand your comment. Is this not equivalent to `[expr [expr $i % 3] == 0]` ? — glenn jackman, Jul 28 '13 at 20:19

score 2 · Accepted Answer · answered Jul 28 '13 at 15:16

The first thing to do is to put as much code in procedures (or lambda terms) as possible and ensure that all expressions are braced. Those were your two key problems that were killing performance. We'll do a few other things too (you hardly ever need expr inside an if test and this wasn't one of those cases, string trim is more suitable than string map, string is really ought to be done with -strict). With those, I get this version which is relatively similar to what you already had yet ought to be substantially more performant.

package require Tcl 8.6

proc checkListNat {str} {
    foreach l [split [string trim $str "()"] ","] {
        if {[string is integer -strict $l] == 0} {
            return 0
        }
    }
    return 1
}

apply {{} {
    set i 1
    set fp [open "/tmp/test.txt" r]
    while { [gets $fp data] >= 0 } {
        if {[incr i] % 3 == 0 && ![checkListNat $data]} {
            puts "error"
        }
    }
    close $fp
}} {*}$argv

You might get better performance by adding fconfigure $fp -encoding iso8859-1; you'll have to test that yourself. But the key changes are the ones due to the bold items earlier, as each substantially impacts on the efficiency of compilation strategy used. (Also, Tcl 8.5 is a little faster than 8.6 — 8.6 has a radically different execution engine that is a bit slower for some things — so you might test the new code with 8.5 too; the code itself appears to be valid with both versions.)

thank you for you answer! Tcl people are always nice that great! — mrsteve, Jul 29 '13 at 21:31

user2141046 · Answer 2 · 2013-07-28T13:54:37.483

1

try checking with regex {^[0-9,]+$} $line instead of the checkListNat function.

update here is an example

echo "87,566, 45,67\n56,5r5,45" >! try

...

while {[gets $fp line] >0} {
 if {[regexp {^[0-9]+$} $line] >0 } {
  puts "OK $line"
 } else {
  puts "BAD $line"
 }
}

gives:

>OK 87,566, 45,67

>BAD 56,5r5,45

edited Jul 28 '13 at 13:54

answered Jul 28 '13 at 13:45

user2141046

862
2
7
21

Parse list of integers (optimization needed for speed test)

2 Answers2