1

(I'm using 32 bit KDB+ 3.3 on OS X.)

If I copy and paste the iris dataset into Excel and save it as "MS-DOS Comma Separated (.csv)" and read it into kdb+, I get this:

q)("FFFFS";enlist ",")0:`iris.csv
5.1al Length Sepal Width Petal Length Petal Width Species
-------------------------------------------------------------

If I save it as "Windows Comma Separated (.csv)", I get this:

q)("FFFFS";enlist ",")0:`iris.csv
Sepal Length Sepal Width Petal Length Petal Width Species
---------------------------------------------------------
5.1          3.5         1.4          0.2         setosa 
4.9          3           1.4          0.2         setosa 
4.7          3.2         1.3          0.2         setosa 
4.6          3.1         1.5          0.2         setosa 
5            3.6         1.4          0.2         setosa 
5.4          3.9         1.7          0.4         setosa 
4.6          3.4         1.4          0.3         setosa 
5            3.4         1.5          0.2         setosa 
4.4          2.9         1.4          0.2         setosa 
4.9          3.1         1.5          0.1         setosa
..

Obviously saving as a Windows csv is what I need to do, and this answer explains the differences, but why does this matter for kdb+? And is there an option I can add to the code to read in MS-DOS csv files?

Community
  • 1
  • 1
userABC123
  • 1,460
  • 2
  • 18
  • 31

1 Answers1

4

I'm running on windows rather than OSX so I can only reproduce the opposite problem but it'll be the same either way.

Use "read0" to see the difference. In my case:

q)read0 `:macintosh.csv
"col1,col2\ra,1\rb,2\rc,3"

q)read0 `:msdos.csv
"col1,col2"
"a,1"
"b,2"
"c,3"

In order to use 0: to parse the file as a table, kdb is expecting multiple strings (as in my msdos file) rather than that single string where the newlines weren't recognised.

So I get:

q)("SI";enlist ",")0:`:msdos.csv
col1 col2
---------
a    1
b    2
c    3

q)("SI";enlist ",")0:`:macintosh.csv
aol1 col2
-----------

You could put something in your code to recognise the situation and handle it accordingly but it would be slower and less efficient:

q)("SI";enlist ",")0:{$[1=count x;"\r" vs first x;x]}read0 `:msdos.csv
col1 col2
---------
a    1
b    2
c    3

q)("SI";enlist ",")0:{$[1=count x;"\r" vs first x;x]}read0 `:macintosh.csv
col1 col2
---------
a    1
b    2
c    3

Works either way

terrylynch
  • 11,844
  • 13
  • 21