1

I have PDF files stored as image datatype (large binary data as mentioned in the doc) in a sybase database table. I am trying to read one of those files from the db and write it to a file in a local folder using python pyodbc package like this example :

 driver = "FreeTDS"
 prt = 'port'
 db = 'db'
 passwd = 'passwd'
 usr = 'usr'
 serv = 'serv'
 conn = pyodbc.connect(driver=driver, server=serv, port=prt, uid=usr, pwd=passwd)
 sql_query = (
    "SELECT ARCH_DOC_DOC as file_content FROM table_name WHERE ARCH_DOC_ID = id"
 )
 cursor = conn.cursor()
 cursor.execute(sql_query)
 pdf_data = cursor.fetchone()[0]
 with open('my_test_file.pdf', 'wb') as f:
     f.write(pdf_data)

I am using TDS driver and running this code on Debian GNU/Linux 11 machine

Compile-time settings (established with the "configure" script)
                            Version: freetds v1.2.3
             freetds.conf directory: /etc/freetds
     MS db-lib source compatibility: no
        Sybase binary compatibility: yes
                      Thread safety: yes
                      iconv library: yes
                        TDS version: auto
                              iODBC: no
                           unixodbc: yes
              SSPI "trusted" logins: no
                           Kerberos: yes
                            OpenSSL: no
                             GnuTLS: yes
                               MARS: yes

The problem is that I am getting corrupt file in the end and after testing a couple of files I noticed that I am always getting a file size 33ko. For example, the original file size that I am using to test is 90ko in the db and the file I am getting is only 33ko. So I am wondering if the issue is in the database/driver config or if there is a limit in the size of data that I can read with pyodbc ? And how can I fix that ?

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • If you do `hd -n 8 my_test_file.pdf` does it display something like `|%PDF-1.4|`? In other words, is the (partial) file simply truncated or is it more seriously damaged? Also, what version of pyodbc are you using? – Gord Thompson Jun 05 '23 at 17:48
  • Yes when i do `head -n 8 my_test_file.pdf` it display something like this : `%PDF-1.4 %���� 6 0 obj << /Creator (HP Exstream Version 8.6.106 64-bit) /CreationDate (D:20190218022622ZZ)`. I think the file is truncated and i am getting only first chunk or something like that. The version of pyodbc : **4.0.35** – Mohamed Amine Mejri Jun 06 '23 at 08:34
  • I was able to reproduce your issue with unixODBC 2.3.6, FreeTDS 1.2.3, and pyodbc 4.0.39. The byte array retrieved by pyodbc was consistently truncated at exactly 32768 bytes. Upgrading to the latest version of FreeTDS (1.3.18) did not resolve the issue. Neither did upgrading to the latest version of unixODBC (2.3.11). Building FreeTDS without `--enable-sybase-compat` also didn't fix it, nor did using SAP's ODBC driver for ASE on Linux (libsybdrvodb-sqllen8.so). You may want to open a pyodbc issue [here](https://github.com/mkleehammer/pyodbc/issues/). – Gord Thompson Jun 07 '23 at 01:00
  • Thank you for the testing. I opened a pyodbc [issue](https://github.com/mkleehammer/pyodbc/issues/1226) and I hope this will be fixed soon. Is there a solution with another package ? – Mohamed Amine Mejri Jun 07 '23 at 09:51

1 Answers1

1

This is a reproducible issue, discussed here

https://github.com/mkleehammer/pyodbc/issues/1226

As a workaround, we can use JayDeBeApi and jTDS, like so:

import jaydebeapi

cnxn = jaydebeapi.connect(
    "net.sourceforge.jtds.jdbc.Driver",
    "jdbc:jtds:sybase://192.168.0.199:5000/mydb;useLOBs=false",
    ["sa", "myPassword"],
    "/home/gord/Downloads/jtds-1.3.1.jar"
    )
crsr = cnxn.cursor()
crsr.execute("SELECT ARCH_DOC_DOC FROM so76408133 WHERE ARCH_DOC_ID = 1")
pdf_data = crsr.fetchone()[0]
with open("test_pdf", "wb") as f:
    f.write(pdf_data)

Note that this requires a Java Runtime Environment (JRE). On Ubuntu/Debian, it can be installed via

sudo apt install default-jre
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • 1
    Thank you for the solution ! I tested using JayDeBeApi and jTDS and it works !! I am getting the whole file now. – Mohamed Amine Mejri Jun 07 '23 at 16:09
  • Hello @GordThompson! Do you have any idea if this workaround could be implemented with sqlalchemy ? I only found this [thread](https://stackoverflow.com/questions/47407988/is-there-sqlalchemy-dialect-support-jdbc) in stackoverflow but this doesn't seem to work. – Mohamed Amine Mejri Jun 08 '23 at 09:34