3

I have a 9.6GB csv file from which I would like to create an on-disk splayed table.

When I run this code, my 32-bit q process (on Win 10, 16GB RAM machine) runs out of memory ('wsfull) and crashes after creating an incomplete 4.68GB splayed table (see the screenshot).

path:{` sv (hsym x 0), 1_x}
symh: {`$1_ string x}

colnames: `ric`open`high`low`close`volume`date
dir: `:F:

db: `db
tbl: `ohlcv
tbldisk: path dir,db,tbl
tblsplayed: path dir,db,tbl,`
dbsympath: symh path dir,db
csvpath: `:F:/prices.csv

.Q.fs[{ .[ tblsplayed; (); ,; .Q.en[path dir,db] flip colnames!("SFFFFID";",")0:x]}] csvpath 

What exactly is going on in the memory and on the disk behind the scene when reading the csv file with .Q.fs and 0:? Is the csv read row by row or column by column?

I thought that only the 132kB chunks are held in the memory at any given time, hoping that .Q.fs is 'wsfull resistant.

Is the q process actually taking in the whole column (splay) into memory, one at a time, as it increments the chunks?

Considering that: (according to this source, among others):

on 32-bit systems the main memory OLTP portion of a database is limited to about 1GB of raw data, i.e. 1/4 of the address space

that would nearly explain running out of memory. As shown on this screenshot taken right after the 'wsfull, couple of columns are near the 1GB limit.

enter image description here

Here is a run with memory profiling:

.Q.fs[{ 0N!.Q.w[]; .[ tblsplayed; (); ,; .Q.en[path dir,db] flip colnames!("SFFFFID";",")0:x]}] csvpath

enter image description here

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
Daniel Krizian
  • 4,586
  • 4
  • 38
  • 75

1 Answers1

0

I believe it's row by row when Q reads a csv. The reason your q session crashes is probably because you didn't clear memory during

.Q.fs[{ .[ tblsplayed; (); ,; .Q.en[path dir,db] flip colnames!("SFFFFID";",")0:x]}] csvpath 

Try to add .Q.gc[]

.Q.fs[{ .Q.gc[]; .[ tblsplayed; (); ,; .Q.en[path dir,db] flip colnames!("SFFFFID";",")0:x]}] csvpath 
Xiangpeng
  • 44
  • 8