0

We're converting a legacy, SQL Server 2000 database to SQL Server 2008. Both store binary files (JPG, BMP, DOC, and PDF) in an IMAGE column (I know the datatype has been deprecated, but changing this is not an option).

The data is inserted the new system with a basic

INSERT INTO [image] Values (SELECT [image] from legacy_db);

basically a straight IMAGE to IMAGE push.

The conversion had been going well. On the new system, the JPG, BMP, and DOC file types opened without problems. The PDFs have been a nightmare.

We've found that for the PDF records, many of the rows that have been moved into SQL Server 2008 have a longer DATALENGTH() than the same rows did in the SQL Server 2000 database.

Does anyone have any clue as to why this is?

Bridge
  • 29,818
  • 9
  • 60
  • 82
paparush
  • 1,340
  • 1
  • 17
  • 25

2 Answers2

1

I'm not sure about the DATALENGTH() issue but as a workaround, if the PDF files are correct in the 2000 database (I assume you've verified this), you could try pulling those values from an app and re-saving them in the new table from the app, instead of just doing an INSERT/SELECT.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • I've actually considered that. There are roughly 1000 PDF records to migrate and the thought of doing that by hand is daunting. – paparush Jun 05 '12 at 03:11
  • By hand? Huh? Your app can loop through the same set of rows that your query does... you shouldn't have to do anything by hand. You should be able to do this with VB, C#, PowerShell, maybe even VBScript... – Aaron Bertrand Jun 05 '12 at 03:14
  • Aaron, thanks for your reply. Can you help me understand the difference here. If I write this migration in C# using the .NET SQLClient classes, rather than straight TSQL, I'm still going to end up with a Select and an Insert. Help me understand what adding a layer of abstraction is going to get me. I guess I could see that by doing this in C#, I could read the bytes out of the Origin and know exactly how many bytes I need to push into the Destination. Is that what you are saying. Thanks. – paparush Jun 05 '12 at 10:34
  • @paparush yes, just an alternative so that the program can validate the number of bytes and even display the before/after for copies that don't match. At the very least this should identify where the mismatch is happening; I suspect though that this will just make the data transfer work correctly. – Aaron Bertrand Jun 05 '12 at 11:03
  • Also, have you verified that the longer versions of the PDF files in the destination database are actually *broken*? Or have you only raised the flags because the sizes are different? For all I know SQL Server may be treating PDF files differently... – Aaron Bertrand Jun 05 '12 at 11:06
  • Actually, NONE of the migrated pdfs open. If go in through the application that uses the SQL 2008 db and attach a PDF by hand, the application can open that PDF that it wrote to the table. JPG and BMP that we migrated over open just fine in the app that uses the SQL 2008 db. It's very odd. – paparush Jun 05 '12 at 11:22
  • Are the added bytes the same on every row? Same number of bytes, or same number of bytes and the exact same bytes? – Aaron Bertrand Jun 05 '12 at 11:24
  • Not every migrated PDF has a longer DATALENGTH() but those that do are 1024 bytes longer. I'd have to very if the actual bytes are exactly the same in exactly the same location. – paparush Jun 05 '12 at 11:33
  • (I'd suspect they're at the end.) – Aaron Bertrand Jun 05 '12 at 11:59
  • So today I did a byte by byte comparison of the old and new fields. In about half of the PDF records, the bytes were not identical. Those that were identical still did not open on the new DB. Here's something odd - If I wrote the BLOB to disk as a PDF, then opened the PDF in Notepad++ I see "Visual FoxPro AcroExch.Document.7" (interspersed with a bunch of null characters) at the top of the file. Wondering why I'm seeing a VFP tag in a file written by an app with a SQL Server 2000 back end? – paparush Jun 05 '12 at 22:55
0

Resolved this issue by writing functions to locate the PDF's BOF marker and EOF marker(there can be multiple EOF markers so you have to find the last one). Stuff the bytes between the two offsets, inclusive, into a new byte array and UPDATE the BLOB field with the new byte array. After this, the PDFs open in the new system.

paparush
  • 1,340
  • 1
  • 17
  • 25