1

FULL EDIT:

I urgently need to access a Microsoft SQL Server and read compressed data from it, doing it with Python. After having had a lot of trouble, I finally found a working Matlab implementation, which does the job. I need it in Python however.

The data is stored in the image data type and has to be decompressed after downloading. This is working Matlab code (using java routines) to do the job:

connection = actxserver('ADODB.Connection');
conString = 'Provider=SQLOLEDB; Data Source=adress.to.server; Integrated Security=SSPI; Initial Catalog=';
connection.ConnectionString = conString;
connection.CursorLocation = 'adUseClient';
connection.Open();
query_string = 'select Zip from Database where DatabaseName=''foo'' and Item=''bar'' ';
return = connection.Execute(query_string);
row = return.GetRows();
data = row{1};
class(data) % returns uint8
a = java.io.ByteArrayInputStream(data);
b = java.util.zip.InflaterInputStream(a);
isc = com.mathworks.mlwidgets.io.InterruptibleStreamCopier.getInterruptibleStreamCopier();
c = java.io.ByteArrayOutputStream;
isc.copyStream(b,c);
result = typecast(c.toByteArray,'uint16');

result now contains an array of integers that are the uncompressed desired data. I would like to achieve the same thing using Python. Here is the code that I am using to retrieve the data:

import pyodbc
connect = pyodbc.connect(Driver="SQL Server", Server="address.to.server")
cursor = connect.cursor()
cursor.execute("select Zip from database where DatabaseName='foo' and Item='bar'")
row = cursor.fetchone()
data = row[0] 
type data # returns <type 'bytearray'>

Now how do I inflate this bytearray? I tried

zlib.decompress(io.BytesIO(data).read())

which returns a bytearray of the correct length, but I need the array of integers that are returned by the above Matlab code. I tried decoding both the compressed and the inflated bytearray, but did not succeed.

Is there a difference on how Matlab and pyodbc handle the image data type in SQL? How can I retrieve the array of integers in Python?

I am using Python 2.7.2, pyodbc 3.0.5 and Matlab R2011b on Windows XP.

Till B
  • 1,248
  • 2
  • 15
  • 19
  • What data type is the zip column? Which versions of Python and pyodbc do you have? – Pondlife Mar 28 '12 at 09:18
  • It's probably not strictly a zipfile, but some other compressed format, such as `compress`, `gzip`, `tgz`, or raw LZ or LZW compressed data. – Ben Mar 28 '12 at 09:34
  • I was told that it is a zip-file by the administrator of the database. @Pondlife I clarified in the question text. – Till B Mar 28 '12 at 10:21

2 Answers2

2

It sounds like you want to unpack things into unsigned 16-bit integers?

So, you have something like:

bytearray(b'\x01\x00\x02\x00\x03\x00\x04\x00')

And you want:

[1, 2, 3, 4]

If so, you have several options.

If you're going to be using numpy anyway, consider using numpy for this.

import numpy as np
dat = bytearray(b'\x01\x00\x02\x00\x03\x00\x04\x00')
data = np.frombuffer(buffer(dat), dtype=np.uint16)

Alternately, you could do something like this using python's builtin array:

import array
dat = bytearray(b'\x01\x00\x02\x00\x03\x00\x04\x00')
data = array.array('H')
data.fromstring(buffer(dat))

You could also use the struct module, but it's less than ideal for repetitive data like this.

Joe Kington
  • 275,208
  • 71
  • 604
  • 463
1

The issue is that with MS SQL when using pyodbc, BLOBs get truncated to 4096 bytes. Solution is to prepend your querySET TEXTSIZE 2147483647 (2GB, theoretically max is 4GB, but that might cause some problems with 32-bit signed values).

So your code would be:

cursor.execute("set textsize 2147483647 select zip from database where DatabaseName='foobar'")

See also: using pyodbc on ubuntu to insert a image field on SQL Server

Community
  • 1
  • 1
vartec
  • 131,205
  • 36
  • 218
  • 244
  • Unfortunately that does not solve the problem. The file size is also the same in both cases, so it does not seem that the problem is due to truncation. – Till B Apr 02 '12 at 10:52