-2

My SQL Server table has an Image datatype column containing PDF's and I am trying to restore (export) them but I get stuck.

I tried different methods using BCP in SSMS but every time I export the file it is either 0 kb in size or few kb (5-10) less than the original PDF (and the file does not open). How can I restore the original pdf without being corrupted. are there any alternatives? I have 100's of files to export.

I am able to create format file using following BCP / TSQL and also export the PDF but it is corrupted (cannot open)

EXEC master..xp_cmdshell  'bcp TestDB.dbo.CLTDOCSX format nul -T -n -f D:\ImagesFromSql\formatfile.fmt'

Format File

But SQL Server throws the following errors when I run the following BCP / T-SQL

DECLARE @sql varchar(500)
SET @sql = 'BCP "SELECT DOCData FROM [TestDB].dbo.CLTDOCSX " QUERYOUT D:\ImagesFromSql\myfilename.pdf -T -f D:\ImagesFromSql\formatfile.fmt -S ' + @@SERVERNAME
EXEC master.dbo.xp_CmdShell @sql

enter image description here

I tried to run BCP command from CMD as admin but still same error. enter image description here

jarlh
  • 42,561
  • 8
  • 45
  • 63
  • 1
    I'm really not a fan of the method of using SQL Server to call command line to call a tool that reconnects to the SQL Server instance you are currently connected to, it's so clunky. You're better off using SQL Server's built in tools (like `BULK INSERT`) or doing this outside of SQL Server. – Thom A Apr 15 '23 at 13:07
  • 2
    You have D:\ImagesFromSql\formatfile.fmt and D:\formatfile.fmt in another script, maybe that's the problem? – siggemannen Apr 15 '23 at 13:37
  • Also `image` has been deprecated since *2005* (almost 20 years ago); you should be using it any more. – Thom A Apr 15 '23 at 13:41
  • Does the SQL Server service account have permission to access those files? – Charlieface Apr 15 '23 at 21:11
  • **@siggemannen** - I have updated the file path as per you comment and still getting same error – Sai Abhiram Inapala Apr 16 '23 at 03:20
  • **@charlieface** - I have tested connection to a csv file in the same folder from SQL server using OPENROWSET, BULK INSERT methods as suggested in this post – Sai Abhiram Inapala Apr 16 '23 at 03:20
  • **@ThomA** - I am trying to extract what is already in the SQL server table, can I still use BULK INSERT to extract? What other tools can I use to extract these PDF's? (data factory?) – Sai Abhiram Inapala Apr 16 '23 at 03:21
  • You specify `-n` for native format when creating the format file, have you tried doing the same when using `queryout`? – AlwaysLearning Apr 16 '23 at 05:06
  • **@AlwaysLearning** - yes used -n during queryout and that doesn't work – Sai Abhiram Inapala Apr 16 '23 at 08:33

2 Answers2

0

Here's a complete test script you can use:

if object_id('CLTDOCSXTest') IS NOT NULL
    drop table CLTDOCSXTest 
create table CLTDOCSXTest (DOCData Image)

-- Put file
insert into CLTDOCSXTest
SELECT * FROM OPENROWSET(
   BULK 'C:\SQLServer\MSSQL11.SIGGE\MSSQL\DATA\sample.pdf',
   SINGLE_BLOB) AS DATA;

-- Create format file
EXEC master..xp_cmdshell  'bcp TestDB.dbo.CLTDOCSXTest format nul -T -n -f C:\SQLServer\MSSQL11.SIGGE\MSSQL\DATA\formatfile.fmt'

-- Export file to pdf
DECLARE @sql varchar(500)
SET @sql = 'BCP "SELECT DOCData FROM [TestDB].dbo.CLTDOCSXTest " QUERYOUT C:\SQLServer\MSSQL11.SIGGE\MSSQL\DATA\output.pdf -n -T -f C:\SQLServer\MSSQL11.SIGGE\MSSQL\DATA\formatfile.fmt -S ' + @@SERVERNAME
EXEC master.sys.xp_CmdShell @sql

-- Compare files, should return 1
SELECT COUNT(*)
FROM CLTDOCSXTest c
INNER JOIN OPENROWSET(
   BULK 'C:\SQLServer\MSSQL11.SIGGE\MSSQL\DATA\output.pdf',
   FORMATFILE ='C:\SQLServer\MSSQL11.SIGGE\MSSQL\DATA\formatfile.fmt') data
    ON   CAST(c.DOCData AS VARBINARY(MAX)) = CAST(data.docdata AS VARBINARY(MAX));

It creates a table like yours, imports a pdf, creates a bcp format file, exports the pdf and compares it to the imported one. I have also tested with really large binary files and they work too.

Note that you need to remove the format files since i don't think they're being recreated

siggemannen
  • 3,884
  • 2
  • 6
  • 24
  • **[@siggemannen](https://stackoverflow.com/users/13061224/siggemannen)** - your code works fine. It looks like the pdf's that I am trying to extract might have some encoding or compression applied when inserting into SQL server table. _**There is another table joining to this table which has 2 columns by the name DOCDATAX (image data type)and COMPRESS (value = XCOMPRESS)**_ – Sai Abhiram Inapala Apr 16 '23 at 11:24
  • Yeah then you need to decompress it the same way it's compressed. Bcp can't do that me thinks – siggemannen Apr 16 '23 at 13:45
0

Here is how I resolved the issue.

I found out that the data in the image column are not pdfs, they are all RAR files that can be opened using programs like 7Z or WinRAR and the actual pdfs are inside the RAR files.

I simply replaced the .pdf with .RAR in BCP query and it worked. QUERYOUT D:\ImagesFromSql\myfilename.rar

All the Hex values in the image column were starting with 0x504B0304 - typically represents the file signature for a ZIP archive. So, first I tried .ZIP which did not work then .RAR which worked.