1

I’m looking to find a library and/or guide that would allow me to encode an image with DCT (discrete cosine transform ) so I can place it in a basic 1.0 pdf file. (FYI I’m using https://git.catseye.tc/pdf.lua/ to create the pdf.

I’ve search the internet for something’s but couldn’t find anything is anyone on SO aware of something using Lua to encode an JPEG with DCT..

Update:

Based on feedback, here’s some additional information on my ask

If you open up a PDF file, the stored JPEG data will appear in the XObject image. Here is an example.

14 0 obj
<<
/Intent/RelativeColorimetric
/Type/XObject
/ColorSpace/DeviceGray
/Subtype/Image
/Name/X
/Width 2988
/BitsPerComponent 8
/Length 134030
/Height 2286
/Filter/DCTDecode
>>
stream (binary data) endstream

The /Type shows that this is an image. The key section is the /Filter value – DCTDecode , which indicates a JPEG (JPX shows a JPEG2000) which also works. The data i need is to go between stream and endstream.

I’m looking for help in how I can get an image converted into the DCT format needed..

nodecentral
  • 446
  • 3
  • 16
  • Your question isn't very clear. Please create a minimal example that shows what you start with - a red rectangle on a white background is enough. Then show what you envisage the output will look like - maybe in hex, or however it can be simply understood. Thank you. – Mark Setchell Mar 11 '23 at 23:44
  • Hi @Mark, thanks for the feedback, update above made, hopefully that will help.. – nodecentral Mar 12 '23 at 00:34

1 Answers1

1

The prime difference for DCT/JPG in PDF is that the .jpeg in a PDF must be "baseline" much as it was in 1992 see also (https://ia801003.us.archive.org/5/items/pdf320002008/PDF32000_2008.pdf#page=42) and that's what MS paint (or any command driven graphics app) will save as "simple" .jpeg (not any exotic type) so here on the right is the everyday.jpeg from MSPaint conversion from PNG or any other complex format, and here is the exact same /DCTdecode object when imported by a PDF writer, on the left.

enter image description here

So If we export the image from the PDF we will get the Jpeg (not the source PNG). How to check they are identical is copy and paste or use extractor.
So the image.jpg used for my cmd line wrap as a pdf is 5,757 bytes the extracted from PDF image is 5,757 bytes, thus we can expect a match.

Check they are the identical binary files (What goes in, comes out, very rare for a PDF)

C:\Apps\Programming\pdf demo>fc /B input.jpg extracted.jpg
Comparing files input.jpg and EXTRACTED.JPG
FC: no differences encountered

So to make a page PDF from an image you simply need a header

%PDF-1.7
%ANSI

1 0 obj <</Type/Catalog/Pages 2 0 R>> endobj
2 0 obj <</Type/Pages/Count 1/Kids [ 3 0 R ]>> endobj
3 0 obj <</Type/Page/MediaBox [ 0 0 841.5 594.75 ]/Rotate 0/Resources 4 0 R/Contents 5 0 R/Parent 2 0 R>> endobj
4 0 obj <</XObject <</Img1 6 0 R>>>> endobj
5 0 obj <</Length 61>>
stream
1 0 0 -1 -0 594.75 cm 841.5 0 0 -594.75 0 594.75 cm /Img1 Do
endstream
endobj
6 0 obj <</Type/XObject/Subtype/Image/ColorSpace/DeviceRGB/BitsPerComponent 8/Filter/DCTDecode
/Width 1123/Height 794/Length 202537 >>stream

where a windows command line or any other script language, can write that last line with the correct values. And a trailer, which is where it may then get messy. So as much of the tail was moved to the head to keep the trailer writing minimal. I have done similar cmd line embedding for Video and Audio, so DCT (Jpeg) images should not be a problem. (except I prefer lossless pixel perfect PNG and that's way harder).

here is a matching trailer for the header above

endstream
endobj
xref
0 7
0000000000 65535 f 
0000000016 00000 n 
0000000061 00000 n 
0000000115 00000 n 
0000000228 00000 n 
0000000272 00000 n 
0000000380 00000 n 

trailer
<</Size 7/Info <</Producer (Cmd2PDF)>>/Root 1 0 R>>
startxref
203076
%%EOF

You simply need to ensure the startxref is correct

So the working program is first use any graphics app to prep the width height and length and apply the dimensions and thus offset to end of header and trailer then briefly

copy /b 8bitHead.txt + 8bit.jpg + 8bitTail.txt 8bitColour.pdf

Since Jpg is a binary compressive encoding, you cant use any plain text copy and paste as it destroys the highest 8th bit of each byte corrupting the jpeg, hence its the pants for building in a textual fashion. Thus needs binary sandwich between the 2 text parts hence copy /b

enter image description here

[Later Edit]

I gave a fairly complex value above for object 5, that can be simplified so say we have an image to be scaled as 500 pt by 477 pt and we want it centred, we can offset use by half of the extra width and half the extra height so simplifieed to W 0 0 H dx/2 dy/2 where dx is the width of whitespace and similar for dy height.

5 0 obj <</Length 61>> stream
500.000 0 0 477.000 170.750 53.873 cm /Img1 Do               
endstream
endobj

[Even LATER edit] For a different question I revisited the methods needed to use a simpler cmd file to automate a single pixel perfect jpg addition. It is not much different to above and needs some spit and polish for production. However it shows how to automate for various source images and can be bettered for a set of images in a loop, but its a start point.
enter image description here

@echo off
set "filename=%~f1"

REM cleanup any failed run !
if exist %temp%\output1.txt del %temp%\output1.txt
if exist %temp%\output2.txt del %temp%\output2.txt
if exist %temp%\output.pdf del %temp%\output.pdf

REM we could write a text header here but its faster to copy one prepared earlier
copy header.txt %temp%\output1.txt

REM Write current image data
@echo fsObj = new ActiveXObject("Scripting.FileSystemObject");var ARGS = WScript.Arguments;var img=new ActiveXObject("WIA.ImageFile");var filename=ARGS.Item(0);img.LoadFile(filename);WScript.StdOut.Write("/Width "+img.Width+"/Height "+img.Height);>"%temp%\dimimg.js"
@cscript //nologo "%temp%\dimimg.js" "%filename%">>%temp%\output1.txt
for %%I in ("%filename%") do @echo /Length %%~zI^>^>>>%temp%\output1.txt
echo stream>>%temp%\output1.txt

REM append image
copy /b %temp%\output1.txt+%filename% %temp%\output2.txt
echo/>>%temp%\output2.txt
echo endstream>>%temp%\output2.txt
echo endobj>>%temp%\output2.txt

REM prep the trailer
for %%I in ("%temp%\output2.txt") do set "startxref=%%~zI"
copy /b %temp%\output2.txt+trailer.txt %temp%\output.pdf
echo %startxref%>>%temp%\output.pdf
echo %%%%EOF>>%temp%\output.pdf

REM call the result
if exist %temp%\output1.txt del %temp%\output1.txt
if exist %temp%\output2.txt del %temp%\output2.txt
%temp%\output.pdf

A demo working set can be found here https://github.com/GitHubRulesOK/MyNotes/blob/master/jpgTOpdf.zip

K J
  • 8,045
  • 3
  • 14
  • 36
  • `fc input.jpg extracted.jpg` compares the files in text mode, up to the first char `0x1A` – ESkri Mar 12 '23 at 10:29
  • Thanks so much @KJ, however I’m afraid the references to PNGs, and using Windows Paint and Windows command line have confused me sorry. You’ve given a lot of great stuff,, is there anything non windows based (maybe linux) I can use to get a DCT encoded version of the jpeg image file I have ? – nodecentral Mar 12 '23 at 14:06
  • Hi @KJ, unfortunately i get a unable to create a secure connection error message when i try that link.. Are you able to post the steps/files maybe on github ? I have to admit this is turning out to be a lot harder than i thought, i assumed it would be something simular to converting something into base64.. – nodecentral Mar 12 '23 at 16:28
  • Thanks @KJ, I think I’m with you now, let me play it back to you, i can take a jpeg (e.g https://commons.wikimedia.org/wiki/File:JPEG_example_flower.jpg) and open it as a text file, and that will show me the image encoded in DCT. I would then copy and paste all that into the section between stream and endstream, making sure my pdf has the last line as trailer and startxref references.. – nodecentral Mar 12 '23 at 17:54