12

I'm trying to parse dates from a large csv file in Racket.

The most straightforward way to do this would be to create a new date struct. But it requires the week-day and year-day parameters. Of course I don't have these, and this seems like a real weakness of the date module that I don't understand.

So, as an alternative, I decided to use find-seconds to convert the raw date vals into seconds and then pass that to seconds->date. This works, but is brutally slow.

(time
 (let loop ([n 10000])
   (apply find-seconds '(0 0 12 1 1 2012)) ; this takes 3 seconds for 10000
   ;(date 0 0 12 1 1 2012 0 0 #f 0) ; this is instant
   (if (zero? n)
       'done
       (loop (sub1 n)))))

find-seconds takes 3 seconds to do 10000 values, and I have several million. Creating the date struct is of course instant, but I don't have the week-day, year-day values.

My questions are:

1.) Why is week-day/year-day required for creating date structs?

2.) Is find-seconds supposed to be this slow (ie, bug)? Or am I doing something wrong?

3.) Are there any alternatives to parse dates in a fast manner. I know srfi/19 has a string->date function, but I'd then have to change everything to use that module's struct instead of racket's built-in one. And it may suffer the same performance hit of find-seconds, I'm not sure.

Sam Tobin-Hochstadt
  • 4,983
  • 1
  • 21
  • 43
Scott Klarenbach
  • 37,171
  • 15
  • 62
  • 91

2 Answers2

7

Although not documented as such, it appears that week-day and year-day are "no-ops" when using the date struct with date->seconds. If I set them both to 0, a date->seconds doesn't complain. I suspect it ignores them:

#lang racket

(require racket/date)

(define d (date 1    ;sc
                2    ;mn
                3    ;hr
                20   ;day
                8    ;month
                2012 ;year
                0    ;weekday  <<<
                0    ;year-day <<<
                #f   ;dst?
                0    ;time-zone-offset
                ))

(displayln (seconds->date (date->seconds d)))

;; =>
#(struct:date* 1 2 3 20 8 2012 1 232 #t -14400 0 EDT)
                               ^ ^^^

My guess is that the date struct was defined for use with seconds->date, where week-day and year-day would be interesting information to provide. Then for date->seconds, rather than define another struct with those fields missing (they're "redundant" for determining the date, which is why you're understandably annoyed :)) for use with date->seconds, the same struct was reused.

Does that help? It's not clear to me from your question what you're trying to do with the date information from the CSV. If you want to convert it to an integer seconds value, I think the above should work for you. If you have something else in mind, perhaps you could explain.

Greg Hendershott
  • 16,100
  • 6
  • 36
  • 53
3

I would say this is an oversight in racket/date.

The call to find-seconds is expensive because it needs to search to find the number of seconds. And since you only need to know the week-day it an unnecessary computation.

Write to the mailing list in order to get advice.

soegaard
  • 30,661
  • 4
  • 57
  • 106