0

I'm working on implementing a neural network to tackle the MNIST dataset in CSV instead of using the images. I'm using Common Lisp with Quicklisp, and the cl-csv utility for CSV parsing. Using cl-csv, how can I return a single row from the CSV? Using (cl-csv:read-csv-row #P"file.csv") this returns row 1. Trying (cl-csv:read-csv-row #P"file.csv" 5) results in: *** - CL-CSV:READ-CSV-ROW: keyword arguments in (#P"test3.csv") should occur pairwise. Can cl-csv return a single specified row, and if so, how do I write the row number as a parameter?

Rainer Joswig
  • 136,269
  • 10
  • 221
  • 346
pxlkllr
  • 11
  • 1
  • 1
    Btw., the language is called Common Lisp and abreviated CL. Not CLISP. CLISP is a specific implementation of Common Lisp. This might confuse readers, who would usually expect that CLISP is the implementation used. – Rainer Joswig Apr 16 '18 at 13:23

2 Answers2

1

A function that is named like read-… is often thought about as reading from a stream. This involves changing the state of the stream so that the next reading interaction starts where the previous left off. A common idiom is to do this in a loop.

It seems that cl-csv wants the user of read-csv-row to handle end-of-file as a signal, so:

(with-open-file (csv-in-stream csv-pathname)
  (handler-case
      (loop :for csv-line := (read-csv-row csv-in-stream)
            :do (process-somehow csv-line))
    (end-of-file () (whatever)))

If you want to get exactly one specific line:

(with-open-file (csv-in-stream csv-pathname)
  (handler-case
      (loop :repeat (1- n)
            :do (read-csv-row csv-in-stream) ; skip skip …
            :finally (return (read-csv-row csv-in-stream)))
    (end-of-file () (oupsie-file-too-short)))

You'd often want to use one of the provided convenience wrappers:

(do-csv (row csv-pathname)
  (process-somehow row))

or using iterate:

(iter
  (for row in-csv csv-pathname)
  (process-somehow row))

I must admit that I have grown rather fond of the alternative library fare-csv, though.

Svante
  • 50,694
  • 11
  • 78
  • 122
0

Solution:

(ql:quickload 'cl-csv) ; load  the cl-csv package

(defun nth-csv-row (csv-path n &rest read-csv-row-parameters)
  "Return nth line of a csv file, parsed."
  (with-open-file (stream csv-path)
    (loop for x from 1 below n
          do (cl-csv:read-csv-row stream))
    (apply #'cl-csv:read-csv-row stream read-csv-row-parameters)))

;; your example executed using the new function:
(nth-csv-row #P"file.csv" 5)

Credits to @Svante, who pointed out a logical mistake I made. (Originally, I was using do (read-line stream) to skip the lines. But since new-line character can be within csv cells, I have to use cl-csv:read-csv-row to parse the stream correctly for the cases that cells contain new-lines. Thank you @Svante!

The wrong(!) old solution (for educational purposes only):

(ql:quickload 'cl-csv) ; load  the cl-csv package

;; a more general function returning the nth line of a file
(defun nth-line (file-path n)
  (with-open-file (stream file-path)
    (loop for x from 1 to (1- n) 
          do (read-line stream))
    (read-line stream)))

;; wrap it with a csv parsing function
(defun nth-csv-line (csv-path n &rest read-csv-row-parameters)
  "Return nth line of a csv file, parsed."
  (apply #'cl-csv:read-csv-row (nth-line csv-path n) read-csv-row-parameters))

;; your example executed using the new function:
(nth-csv-line #P"file.csv" 5)

(wouldn't parse correctly, if a csv cell would contain a newline character!) - (read-line) doesn't check whether the new-line character is inside a cell or outside a cell.

Anyway - now follows, what I in addition remarked before (still valid):

Since:

[Function] read-csv-row ( stream-or-string &key (separator separator) (quote quote) (escape quote-escape) &aux current state line llen c elen) => result

Read in a CSV by data-row (which due to quoted newlines may be more than one line from the stream) (https://github.com/AccelerationNet/cl-csv/blob/master/DOCUMENTATION.md#read-csv-row)

And since the &rest read-csv-row-parameters passes all further parameters to the cl-csv:read-csv-row function (exactly like R's ...), the nth-csv-line has the full capabilities of the cl-csv:read-csv-row function. Thus,

This solution works not only with comma-separated, but also with any other delimiter-separated data

Example:

Consider "~/test.csv" with the content:

abc def klm
1   2   3
A   B   C

(note: this is a tab delimited file rather than a comma separated file)

Parse its second row by:

(nth-csv-row "~/test.csv" 2 :separator #\TAB) ; instead of comma

;; returns - correctly parsed: ;; ("1" "2" "3")

Appendix (installing quicklisp correctly, to run these snippets ...)

If somebody reading this is a newbie wants to try it and has no quicklisp working (I had to figure out it anew - so maybe it saves your time):

;; ;; If quicklisp is not installed, do on terminal:
;; $ wget https://beta.quicklisp.org/quicklisp.lisp
;; ;; Then in your lisp interpreter:
;; (quicklisp-quickstart:install)
;; ;; following instructions of quickslisp do
;; (load "~/quicklisp/setup.lisp") ; or: path/to/your/quicklisp/setup.lisp

;; With installed quicklisp, you can from now on install and load 
;; any quicklisp-able package by:
(ql:quickload 'cl-csv) ; install cl-csv using quicklisp
Gwang-Jin Kim
  • 9,303
  • 17
  • 30
  • CSV fields may contain newlines. It is not correct to first read lines, then make csv-rows from them. You must use the CSV parser to read _rows_ from the stream. – Svante May 19 '18 at 08:45
  • Thank you for pointing out, @Svante! It was a logical mistake. And now I understood more about what's about csv-parsing ... corrected version above! – Gwang-Jin Kim May 19 '18 at 09:26