0

I'm running R 4.3.1 on Windows 10 using RGui. If I connect to my database:

library(RODBC)

connstr = sprintf('driver={ODBC Driver 17 for SQL Server};server=%s;database=%s;uid=%s;pwd=%s',
    /* my parameters */)
    
dbhandle <- odbcDriverConnect(connstr)

I can get back a single column:

tbl1 <- sqlQuery(dbhandle, sprintf('select runID from myTable WHERE runID = 34'))

but if I try to get both columns:

tbl2 <- sqlQuery(dbhandle, sprintf('select runID, fit from myTable WHERE runID = 34'))

RGui just crashes. It doesn't display a message; its windows just close and that's the end of that.

R.EXE at the command line does pretty much the same: it ends and returns me to the command prompt with no error message.

Why does such a trivial exercise crash R? How can I use R without it crashing?

Also on Ubuntu

Using R 3.6.3 on Ubuntu 20.04, I have the same problem but at least I get a segfault message:

> tbl2 <- sqlQuery(dbhandle, sprintf('select fit from myTable WHERE runID = 34'))

gives this:

 *** caught segfault ***
address 0x560b5bbef000, cause 'memory not mapped'

Traceback:
 1: odbcFetchRows(channel, max = max, buffsize = buffsize, nullstring = nullstring,     believeNRows = believeNRows)
 2: sqlGetResults(channel, errors = errors, ...)
 3: sqlQuery(dbhandle, sprintf("select fit from myTable WHERE runID = 34"))

How can I successfully query data from ODBC using R?

MikeB
  • 1,452
  • 14
  • 28

1 Answers1

0

After working on this and the related issue, I think it's pretty clear there's a bug in RODBC surrounding the handling of binary data types on SQL Server.

If I start with a connection:

# install.packages('RODBC')

library(RODBC)

connstr = sprintf('driver={ODBC Driver 17 for SQL Server};server=%s;database=%s;uid=%s;pwd=%s', 
    /* my parameters */)
dbhandle <- odbcDriverConnect(connstr)

and in that target database, I create a little table:

create table mikeb.repocase (runid int, bindata varbinary(max));
insert into mikeb.repocase (runid, bindata) values (33, 0x0102030405);

I should be able to select the data out. Here, the data is just five bytes so the chances that the memory corruption issue are encountered are low, but not zero. With my connection in R:

tbl1 <- sqlQuery(dbhandle, 'select bindata from mikeb.repocase WHERE runID = 33')

the data is returned. But it's incorrect:

> tbl1$bindata[[1]]
[1] 00 00 00 00 00

It's all zeroes, when 01 02 03 04 05 should be returned.

Casting the data to image will get correct results back on Linux (Ubuntu):

> tbl2 <- sqlQuery(dbhandle, 'select CAST(bindata as image) as bindata from mikeb.repocase WHERE runID = 33')
> tbl2$bindata[[1]]
[1] 01 02 03 04 05

However, on the Windows RGui client, the same code fails with an error about allocating an impossible quantity of memory:

> tbl2 <- sqlQuery(dbhandle, 'select cast(bindata as image) as bindata from mikeb.repocase WHERE runID = 33')
Error in odbcQuery(channel, query, rows_at_time) : 
  'R_Calloc' could not allocate memory (214748364800 of 1 bytes)

On Linux, getting a non-trivial amount of data from a varbinary(max) column (in my case, about 13 megabytes) causes a segfault failure sometimes, and a return of an object with the correct length but uninitialized data the rest of the time. (Is this a security concern?) However, using the IMAGE cast seems to almost always work.

The offending code is in the RODBC cachenbind() function, where the column description inspection happens and data binding is set up:

case SQL_BINARY:
case SQL_VARBINARY:
case SQL_LONGVARBINARY:
{
    /* should really use SQLCHAR (unsigned) */
    SQLLEN datalen = thisHandle->ColData[i].ColSize;
    thisHandle->ColData[i].datalen = datalen;
    thisHandle->ColData[i].pData = R_Calloc(nRows * (datalen + 1), char);
    BIND(SQL_C_BINARY, pData, datalen);
}

The call to R_Calloc is passed a size of one, and a quantity computed as nRows * (datalen + 1). datalen came from the SQLDescribeCol() API for this column.

For the IMAGE data type, we get SQL_LONGVARBINARY as the type and datalen is -1. With that computation and its careless sign management, plus ignoring overflow, the quantity requested of R_Calloc is 214748364800, matching the error message we see on Windows. (Since I'm not set up for C development on Linux just now, I can't comment on what's going on there. Maybe Linux manages to map some memory in response to this R_Calloc call, or maybe it ends up getting different numbers from the Linux version of the same driver. Or, could be that something else happens.)

For the VARBINARY(max) type, we get SQL_VARBINARY as the type and a datalen of 0. The number of bytes requested of R_Calloc, then, is just nRows which is way smaller than expected -- looks like it's 100 bytes.

Curiously, nRows is used here rather than the computed NROWS.

Regardless, it seems as though we're simply lucky that RODBC can return binary data at all because it's not correctly handing the binding of the data or the allocation of the binding memory.

I've written to the package maintainer today, but I haven't received a response yet.

MikeB
  • 1,452
  • 14
  • 28