I'm trying to generate a PDF file programmatically.
The entire case is: I'm receiving a multiple page PDFS. Each page is an image, with the contents i want. I don't want to use external libraries because i'm looking for performance \ optimization (in the long run it will matter to me). I used to have something already working (i created a system like header\file content(image)\footer), and it always worked. However, something has changed and it stopped working.
Anyway, in order to fix it and build from scratch, here are the steps i executed:
- Extracted the FlateDecode portion related to the image file (one of many)
- Created an clean JPEG from it(no photoshop headers or etc, a simple JPEG file)
- Submitted the file to some online PDF converting service ; created an file from this JPEG.
- Identified how the PDF file was built and the image part. Coded everything manually, included references in the xref table
- All i get is that "The file is damaged". I've compared both files (original and the one i made), and they both seem to be almost equal (size difference because of the image portion).
I don't know what else to do since everything seems to be almost exatly. I've also decoded some string FlateDecode portion inside the PDF file but i couldn't find anything related to object positioning inside the file.
Here's the code i'm using:
using (var b = new BinaryWriter(File.Open(@"C:\test\Rio\Reboot\fullmanual01.pdf", FileMode.Create)))
{
var imgBytes = File.ReadAllBytes(@"C:\test\Rio\Reboot\decompressedimg.raw");
var firstFlate = File.ReadAllBytes(@"C:\test\Rio\Reboot\flateStr01.raw");
var FlateDecompressed = Encoding.ASCII.GetString(FlateDecompress(firstFlate));
string crlf = Environment.NewLine;
var pdfHeader = Encoding.ASCII.GetBytes($"%PDF-1.4{crlf}");
b.Write(pdfHeader);
pdfHeader = StringToByteArray("25E2E3CFD30D0A");
b.Write(pdfHeader);
var pdfObj = new PDFStrObject(1, $"/Type /Page{crlf}/MediaBox [ 0 0 595 769 ]{crlf}/Resources << /XObject << /X0 3 0 R >> >>{crlf}/Contents 4 0{crlf}/Parent 2 0 R{crlf}/Rotate 360{crlf}>>{crlf}endobj{crlf}").byteFromStrObj;
b.Write(pdfObj);
var secondObjPos = b.BaseStream.Position.ToString("0000000000");
pdfObj = new PDFStrObject(3, $"/Type /XObject{crlf}/Subtype /Image{crlf}/Width 1016{crlf}/Height 1328{crlf}/BitsPerComponent 8{crlf}/ColorSpace /DeviceGray{crlf}/Filter /FlateDecode{crlf}/Length {imgBytes.Length}{crlf}>>{crlf}stream{crlf}").byteFromStrObj;
b.Write(pdfObj);
b.Write(imgBytes);
b.Write(Encoding.ASCII.GetBytes($"{crlf}endstream{crlf}endobj{crlf}"));
var thirdObjPos = b.BaseStream.Position.ToString("0000000000");
pdfObj = new PDFStrObject(4, $"/Filter /FlateDecode{crlf}/Length 45{crlf}>>{crlf}stream{crlf}").byteFromStrObj;
b.Write(pdfObj);
b.Write(firstFlate);
b.Write(Encoding.ASCII.GetBytes($"{crlf}endstream{crlf}endobj{crlf}"));
var secondPos = b.BaseStream.Position;
pdfObj = new PDFStrObject(2, $"/Type /Pages{crlf}/Kids [ 1 0 R ]{crlf}/Count 1{crlf}>>{crlf}endobj{crlf}").byteFromStrObj;
b.Write(pdfObj);
var firstObjPos = b.BaseStream.Position.ToString("0000000000"); //2 0 obj
pdfObj = new PDFStrObject(5, $"/Type /Catalog{crlf}/Pages 2 0{crlf}>>{crlf}endobj{crlf}").byteFromStrObj;
b.Write(pdfObj);
var fourthObhPos = b.BaseStream.Position.ToString("0000000000");
b.Write(Encoding.ASCII.GetBytes($"xref{crlf}0 6{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"0000000000 65535 f{crlf}0000000017 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"{firstObjPos} 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"{secondObjPos} 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"{thirdObjPos} 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"{fourthObhPos} 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"trailer{crlf}<<{crlf}/Size 6{crlf}/Root 5 0{crlf}/ID [<05bebfaf5c6382cfbc44cd1b3389e097><05bebfaf5c6382cfbc44cd1b3389e097>]{crlf}>>{crlf}startxref{crlf}{b.BaseStream.Position+7}{crlf}%%EOF{crlf}"));
}
and the class for building objects:
class PDFStrObject
{
public string strObj { get; private set; }
public byte[] byteFromStrObj { get; private set; }
public PDFStrObject(int objNum, string content)
{
string crlf = Environment.NewLine;
strObj = $"{objNum} 0 obj{crlf}<<{crlf}{content}";
byteFromStrObj = Encoding.ASCII.GetBytes(strObj);
}
}
The files i've been using are here: https://drive.google.com/drive/folders/11HN9cB9Cs7uqBQdpZkNyNKt29sl_xJrL?usp=sharing
The description is:
decompressedimg-convertido.pdf -> The file i converted online.
decompressedimg.raw -> The image portion i extracted from the multi-page PDF. Dimensions are W: 1016, H: 1328
fullmanual01.pdf -> The file i generated using my code.
PDfRjMultiplePages -> The PDF file with multiple pages i'm willing to programatically extract pages from.
Any input is appreciated. I've also reffered to the question: Issue writing a PDF file from scratch but couldn't find a hint for what i'm trying to do (unfortunately)
Tanks