4

I have a large data.frame (15 columns and 100,000 rows) in an existing R session that I want to send to a Q/KDB instance. From KDB's cookbook, the possible solutions are:

RServer for Q: use KDB to create new R instance which shares memory space. This doesn't work because my data is in an existing instance of R.

RServe: run an R server and use TCP/IP to communicate with Q/KDB client. This does not work, because as per RServe's documentation, "every connection has a separate workspace and working directory" and so i presume does not see my existing data.

R Math Library: access R's functionality via a math library without needing an instance of R. This does not work because my data is already in an instance of R.

So any other ideas on how to send data from R to Q/KDB?

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
mchen
  • 9,808
  • 17
  • 72
  • 125

1 Answers1

2

open a port in Q. I start Q with a batch file:

@echo off
c:\q\w32\q -p 5001

load qserver.dll

tryCatch({
dyn.load("c:/q/qserver.dll")}
  ,error = function(f){
    print("can't load qserver.dll")
  })

Then use these

open_connection <- function(host="localhost", port=5001, user=NULL) {
         parameters <- list(host, as.integer(port), user)
      h <- .Call("kx_r_open_connection", parameters)
    assign(".k.h", h, envir = .GlobalEnv)
    return(h)
}

close_connection <- function(connection) {
         .Call("kx_r_close_connection", as.integer(connection))
}

execute <- function(connection, query) {
         .Call("kx_r_execute", as.integer(connection), query)
}

 d<<-open_connection(host="localhost",port=thePort)

ex2 <- function(...) 
{
  query <- list(...)
  theResult <- NULL
  for(i in query) theResult <- paste0(theResult,i)
  return(execute(d,paste0(theResult)))
}

then ex2 can take multiple arguments so you can build queries with R variables and strings

Edit: thats for R from Q, heres R to Q

2nd Edit: improved algo:

library(stringr)
  RToQTable <- function(Rtable,Qname,withColNames=TRUE,withRowNames=TRUE,colSuffix = NULL)
{
  theColnames <- if(!withColNames || length(colnames(Rtable))==0) paste0("col",as.character(1:length(Rtable[1,])),colSuffix) else colnames(Rtable)
  if(!withRowNames || length(rownames(Rtable))==0) withRowNames <- FALSE
  Rtable <- rbind(Rtable,"linesep")
  charnum <- as.integer(nchar(thestr <- paste(paste0(theColnames,':("',str_split(paste(Rtable,collapse='";"'),';\"linesep\";\"')[[1]],');'),collapse="")) - 11)
  if(withRowNames)
    ex2(Qname,":([]",Qname,str_replace_all(paste0("`",paste(rownames(Rtable),collapse="`"))," ","_"),";",.Internal(substr(thestr,1L,charnum)),"))") else
    ex2(Qname,":([]",.Internal(substr(thestr,1L,charnum)),"))")
}

> bigMat <- matrix(runif(1500000),nrow=100000,ncol=15)
> microbenchmark(RToQTable(bigMat,"Qmat"),times=3)
Unit: seconds
                      expr      min     lq     mean   median       uq      max neval
 RToQTable(bigMat, "Qmat") 10.29171 10.315 10.32766 10.33829 10.34563 10.35298     3

This will work for a matrix, so for a data frame just save a vector containing the types of each column, then convert the dataframe to a matrix, import the matrix to Q, and cast the types

Note that this algo is approx O(rows * cols^1.1) so you'll need to chop the columns up into multiple matricies if you have any more than 20 to get O(rows * cols)

but for your example 150,000 rows and 15 columns takes 10 seconds so further optimization may not be necessary.

hedgedandlevered
  • 2,314
  • 2
  • 25
  • 54
  • 1
    Thanks, but your solution only talks about **querying Q from within R** using [Q-Server for R](http://code.kx.com/wiki/Cookbook/IntegratingWithR#Calling_kdb.2B_from_R). I need to send a `data.frame` with 15 columns and 100,000 rows **from R to Q**. Could you demonstrate sending a large `data.frame` from R to Q please? – mchen Oct 16 '14 at 13:48
  • Did this help? please accept if so, if not let me know what I can add – hedgedandlevered Oct 23 '14 at 13:36
  • 2
    Thanks, but it still seems a bit of a flaky solution - you're essentially building an R to Q translator by serializing R objects a Q expression. I'd tried doing this myself at first and there were at least 2 problems - the string serialization of the R `data.frame` is a lot less efficient than passing a binary object, so communication times are long especially when the table is large. Second, what about objects not compatible with Q, for instance, factors, strings, lists in data.frames? I'll keep this question open for a little longer in case of any more production-suitable solutions. – mchen Oct 23 '14 at 22:32
  • Well Q doesn't natively support R variables, so what could you do besides translating it? I actually modified this algorithm today, it can now transfer about 650,000 table-cells per second, and is O(rows * columns) instead of this implementation which is roughly O(rows^2 * columns^1.5), and also its 6 lines and handles with/without row/column names. I'll edit to show that tomorrow – hedgedandlevered Oct 24 '14 at 00:27
  • As for other data types... strings can just `ex2("qStringName:",myString)` or even `ex2("qConcatString:",str1,str2)`. I have something to put lists into Q as well, but that's outside the scope of this question so I didn't include it. For data frames I'd just convert it to a matrix then upload it here, then do type-casting in Q by column, which should be marginally faster than handling a bunch of R-types to convert to Q-types – hedgedandlevered Oct 24 '14 at 00:33
  • Qmat is full of strings, whereas bigMat contains numerics. Any advice how to efficiently cast as floats instead of strings? – Daniel Krizian Dec 27 '14 at 22:06
  • what do you mean "efficiently"? This is to get everything into Q... from there any casting operations are very cheap. If you're doing that a lot, you can create a function that makes a second matrix using typeof() and a mapping to Q types, then does the casting there – hedgedandlevered Jan 08 '15 at 17:02
  • another alternative to your answer is to serialize the string to file, then load the file in q with `ex2("\l /path/file.q")` as described in [Rory Winston's blog article](http://www.theresearchkitchen.com/archives/776). Which approach do you reckon is faster and more robust? – Daniel Krizian Feb 15 '15 at 12:03
  • also, wondering whether dumping the `data.frame` to csv file and subsequently loading it with command ``table: ("ISI"; enlist ",") 0:`data.csv`` (note "ISI" in the example stands for integer-string-integer type casting of columns) wouldn't be more versatile support of different datatypes – Daniel Krizian Feb 15 '15 at 19:23