2

What's the most efficient way to insert R objects (more specifically, time series expressed as xts or data.table objects, i.e. time-based and numeric columns) into a kdb+ database?

I was able to locate only solution involving string serialization via q expressions as described here and here.

Samuel Liew
  • 76,741
  • 107
  • 159
  • 260
Daniel Krizian
  • 4,586
  • 4
  • 38
  • 75

2 Answers2

3

My solution was inspired by this version of qserver.c from github

Yang added two functions: convert_binary, convert_r that [de]serialized data, which is basically what you asked for. However, the return value is a hexadecimal array. To incorporate with existing execute function, we need to use paste(collapse="") to convert into a string, then use sprintf to execute. The following is the example, which will send robj in R to d in kdb:

execute(h, sprintf("d:-9!0x%s",paste(convert_r(robj),collapse="")))

The problem is that paste(collapse="") takes quite some time if the array is large.

robj is the r object. e.g. I tried it with a data.frame (dim = 60,000x100). convert_r() took < 0.5s to convert; paste(collapse="") took 13s to transform into a single string, then execute(h, ...) took < 1s to transfer the data.

I have not found anyone who has written a function sending R Data to kdb via serialized binary data (I don't know why), so I made one myself. Here is the code:

SEXP kx_r_send_data(SEXP connection, SEXP robj, SEXP varname)
{
  K result, conversion, serialized;
  kx_connection = INTEGER_VALUE(connection);
  conversion = from_any_robject(robj);
  serialized = b9(2, conversion);
  result = k(kx_connection, "{[d;v] v set -9!d;}", r1(serialized), ks((S)CHARACTER_VALUE(varname)), (K)0);
  SEXP s = from_any_kobject(result);
  r0(result);
  r0(conversion);
  r0(serialized);
  return s;
}

I assume you have the knowledge to modify the qserver.c and recompile qserver.o Then you add a function in qserver.R:

send_data <- function(connection, r_obj, varname) {
  .Call("kx_r_send_data", as.integer(connection), r_obj, varname)
}

That is the true way of sending R Data to kdb via serialized binary at C level.

Note:

1) the conversion doesn't work with data.table as it's not a standard R class. Calling the function with data.table will lead to segmentation fault.

2) Serialization doesn't know how to convert date/datetime type of object. Serialization will make it all 0N after transfer into kdb.

Unless you want to implement the date/datetime/data.table conversion from R to K, Do NOT call convert_r() or send_data() functions for those types.

On the other hand, there is a quick workaround. For data.table, simply use as.data.frame to convert it to data.frame class before calling the functions. For date/datetime class, use as.character() to convert into string before sending to kdb. Then cast to "D" or "P" inside KDB directly.

3) serializing data.frame includes other information such as rows, row name, class info, etc. You need to manipulate the data inside kdb after the transfer.

I would suggest writing an R wrapper function that handles those abnormal cases, then call send_data() to pass the data to kdb. Then use execute(h, ...) to manipulate the data into a standard format inside kdb.

The same data (60,000x100) now takes < 1s to finish, end-to-end from R to kdb.

PS> I may have a typo inside the code as I don't know how to paste pretty code up here. I actually typed it out instead. Let me know if you found any critical typo within the code

statquant
  • 13,672
  • 21
  • 91
  • 162
lui.lui
  • 81
  • 1
  • 7
  • my code have ks with two arguments. 1. (S)CHARACTER_VALUE(varname), which convert "varname" to a SEXP type character, 2. (K)0 , take integer and regrun as a K object. Also, the "S" type is upper case, it's SEXP type, i.e you need to include , "rserver.c", and "qserver.c" in order to work. take a look the base.c in the "qserver.c from github" to get the idea how to setup the env to compile. For your k parenthesis, can you check if that's a typo somewhere in the code @statquant – lui.lui Oct 24 '16 at 18:56
0

The most "stable" way to interact with kdb from R is to use the string query interface. If you want actual object [de]serialisation then suggest you look at the C interface and call that lib from R to interact with KDB.

Manish Patel
  • 4,411
  • 4
  • 25
  • 48
  • thanks for the insight. Given that I am not conversant with C and no such R/Rcpp wrapper exists in the public domain, would anyone in the community be interested in publishing R package that does that? I would support that even financially. – Daniel Krizian Jan 03 '15 at 12:01
  • Neither am I, else I'd offer to do it :) There's some info here about loading libs into R which one of my colleagues has done before quite successfully: http://users.stat.umn.edu/~geyer/rc/ . – Manish Patel Jan 05 '15 at 11:20
  • using string query interface is only good if you extract data out of kdb, not inserting large data into kdb. What i provided above is for inserting. Some other approach is to setup a Rserver, then pull data on the kdb side from the RServer. But that's another layer of setup which I tried to avoid. – lui.lui Mar 13 '18 at 07:35