0

Someone has used serialize() to stick an R object into SQL Server, in a VARBINARY column.

How can I unserialize() it out?

Here's the code that gets the data:

library(RODBC)
dbhandle <- odbcDriverConnect("MyConnection String")
x = sqlQuery(dbhandle, sprintf('select SomevarbinaryColumn from MyTAble WHERE the_key = 8675309'))

If I try this:

unserialize(x$SomevarbinaryColumn[1])

I get

Error in unserialize(x$SomevarbinaryColumn[1]) :
   'connection' must be a connection

Confusingly, the first parameter to unserialize() is named connection. The documentation says "unserialize reads an object (as written by serialize) from connection or a raw vector.". What kind of connection might be provided here? Instead, though, I think I'm trying to pass a raw vector, but I can't figure out how to convert the RODBC result to a raw vector.

I've tried to figure out what type x$SomevarbinaryColumn[1] is so I can convert it to something that unserialize() likes:

print(typeof(x$SomevarbinaryColumn[1]))

gives

[1] "list"

the RODBC docs say that varbinary will come back as "a list of raw vectors."

How do I get the raw vector that unserialize() wants from this returned list?

MikeB
  • 1,452
  • 14
  • 28
  • 1
    Can you provide the output of `dput(x)` so we can see how the data is returned? Is it a list of raw values? – MrFlick Aug 08 '23 at 20:44
  • @MrFlick I'm sorry, I cannot. `dput(x)` prints many megabytes of hexadecimal data. The last couple lines are: `0x28, 0xf5, 0xc7, 0x46, 0xeb, 0x55, 0x00, 0x00, 0x98, 0xf5, 0xc7, 0x46, 0xeb, 0x55, 0x00, 0x00, 0x05, 0x00, 0x00))), class = "ODBC_binary")), class = "data.frame", row.names = 1L)` – MikeB Aug 08 '23 at 21:02
  • 1
    What about `unserialize(x$SomevarbinaryColumn[[1]][[1]])`, does that work? If not, what is `class(x$SomevarbinaryColumn[[1]])`? – MrFlick Aug 08 '23 at 21:05
  • @MrFlick `unserailize()` results in `read error`, and `class()` gives `[1] "raw"`. At this point, I think I've traced this down to a bug in RODBC and will write that up if I can confirm it. – MikeB Aug 09 '23 at 14:11

1 Answers1

0

There are two problems here. One is that RODBC isn't capable of reliably getting binary data back from SQL Server; the other is that I have to figure out how to unpack the binary data returned (if it gets returned) so that I can call unserialize() on it and rehydrate my object.

In the related issue, I've explained what I found when investigating the problems with RODBC's handling of varbinary(max).

For now, let's suppose that RODBC works and gets the correct data back. I can sometimes coax it to do so, but for the long term a better fix is necessary. Maybe the package maintainer will respond, or maybe I'll for the package and make some fixes for myself, or maybe I'll switch to a different package.

Thing is, even if RODBC looks like it is working, it might be returning bad data. If I code this:

x = sqlQuery(dbhandle, sprintf('select SomevarbinaryColumn from MyTAble WHERE the_key = 8675309'))

then either R encounters a GPE or I get data back. The data that comes back is the correct length, but the data is incorrect -- that is, it is uninitialized garbage. It can never be un-serialized into my object.

However, I can code this to work around the RODBC bug, sometimes:

x = sqlQuery(dbhandle, sprintf('select CAST(SomevarbinaryColumn AS IMAGE) AS SomevarbinaryColumn from MyTAble WHERE the_key = 8675309'))

and then I find that I do get correct data back. It's the right length, and contains the expected bytes!

All that's left is to find the right way to get my object out. And that's here:

x = unserialize(as.raw(unlist(x$SomevarbinaryColumn[1]))) 

Hopefully, that answer helps people who are using more reliable storages and face the question of how they might unpack their serialized data.

MikeB
  • 1,452
  • 14
  • 28