0

I am parsing Type3 glyphs fonts from Pdf to postscript. The input file have inline image with data streams flate decode filter applied.the filter has predictor 15. Any body can help how I take the image streams form pdf to postscript. This is how the input stream is given in pdf

32 0 obj 
    <<
    /Length 342
    >>
    stream
    37 0 4 -52 33 -1 d1
    0.01 0 0 0.01 0 0 concat
    gsave 2900 0 0 -5100 400 -100 concat
    BI
    /IM true
    /W 29
    /H 51
    /BPC 1
    /D[1
    0]
    /F/Fl
    /DP<</Predictor 15
    /Columns 29>>
    ID xœ=Ì¡
    Â`ÅñÿeÂLθ n`0>Ù`ñ
    f[¦DŒF_ÁhC1ì%Ä)¶o.¢Ÿ"†ßá†s®àì]^ÏŠÅS³tFËÂÚ3sç'Æi èÐÇ:j‹¹¨åìOTÿ ª•ÉÙÕÅŸ¨‡¹Ó$°ÆÎšWèÁ!¯Cê
    ÷0&f    µtðV ©Ë÷iôíØªÄ~Ø•Œöí&´« +ro#Ê‚ûÏÅùlßG'
    EI gRestore

    endstream 
    endobj 

And here is what i am trying to write in output in Postscript

/g21 {
37 0 4 -52 33 -1 setcachedevice
q
[0.01 0 0 0.01 0 0] concat
q
[2900 0 0 -5100 400 -100] concat
[ xœ…ѱNÃ0à3©p'l` ¢abä*‰'@‚W`KP¡00öQ`d@ ¨CWž€u`‰štj4Ü]@ /ù¤œíÿ| ÂìÊüå7úЉV'‚ª¦zò¡9à*´º
m1Õ`ñ—íü‹­‡½Gù@ãÝAVxc¥Ž®"6oFܬJHÃB3(æod¾…xFP†o$!v±Ã»·0—gØY÷J$û„`´#zÊ
Oí¼œÑ¸é`Ê}ü…ñ.Z¯›cF4\¡*O¤ÑPÒYòî¦/éG‘qÑç¼2>öq<Üœ<
B˜5‚²¢ºÎ/èqUTUàoÓ9͔Π܉ä²z ‡S×ÛÙC(PA²š7è­T¾ŽCGÈRaLéåksnˆÃ0z<zø:ž=
]
0
<<
  /ImageType 1
  /Width 29
  /Height 51
  /ImageMatrix [29 0 0 -51 0 51]
  /BitsPerComponent 1
  /Decode [1 0]
  /DataSource { 2 copy get exch 1 add exch }
    <</Predictor 15
    /Columns 29 
    >>
    /FlateDecode filter
>>
imagemask
pop pop
gRestore
gRestore
} def
Cœur
  • 37,241
  • 25
  • 195
  • 267
Kbstar
  • 15
  • 8

1 Answers1

0

PostScript has mostly the same filters as PDF. You don't need to decompress the data, just use the FlateDecode filter in PostScript and leave the compressed data untouched.

Note you'll need Language Level 3 for Predictor 15 (or any other PNG predictor) but that shouldn't be a problem, level 3 has been the standard for 18 years.

Otherwise you'll need to implement a version of the FlateDecode filter which supports the PNG Predictor. I believe zlib is quite capable of this.

[EDIT]

Your 'PostScript output' is incomplete, you are using PDF operators (q and Q) which you have not provided a definition for. Apart from anything else this makes it impossible to run the code through an interpreter. Kindly supply a complete simple example file, as requested. Not pasted code, I'm not inclined to go and create a file myself, and besides, binary doesn't cut and paste at all well.

Off the top of my head from desk checking I can't immediately see a problem, but since I can't run the code, I could easily be missing something.

[EDIT 2]

And that file, unsurprisingly, works fine.

You haven't supplied the PostScript file that you are creating. Its rather hard for me to tell what's wrong with the PostScript you created by looking at the PDF file you started with.

You could, of course, use Ghostscript (and I see you've used it to create the PDF file) to create a PostScript file, and then look at what that contains. If you set -dCompressFonts=false then the output font won't even be compressed.

For example:

37 0 4 -52 33 -1 d1
0.01 0 0 0.01 0 0 cm
q 2900 0 0 -5100 400 -99.9998 cm
BI
/IM true
/W 29
/H 51
/BPC 1
/D[1
0]
/F[/A85
/CCF]
/DP[null
<</K -1
/Columns 29>>]
ID
-D=,M5m+t^0_>op8\HM"Du]KKrr2rthqG/5qU_ik]$f$TlUslD91qoN93j0%dckk:ld^*DV25!+
!WX>~>
EI Q

Of course you'll need to look at the prolog to see how all the procedures used there are defined, but you can do that yourself, you certainly don't need me to do it. Notice that the imagemask uses the CCITTFax and ASCII85 decode filters, its trivial to add additional filters. Since the data is guaranteed to be 'monochrome' (its a mask) the CCITT filter generally gives superior compression to Flate.

Note that if you are really using Ghostscript 9.05 then you should upgrade, that is 6 years old.

It might possibly help if you were to explain why you want to take an ugly, bitmapped, type 3 font from PDF and make an ugly, bitmapped type 3 PostScript font from it.

[EDIT 3]

well looking at your PostScript file, the definition of the glyphs does not match what you've put in your question. The actual content looks like this:

/g10135{
88  0  4  -70  82  8  setcachedevice 
q
[
0.01  0  0  0.01  0  0  ] M 
q
[7800  0  0  -7800  400  800  ]M 
<<
/ImageType 1
  /Width  78

  /Height  78

  /ImageMatrix [  78 0 0 -78 0 78]
  /BitsPerComponent  1

  /Decode [1
0]

  /DataSource ....binary data.....


<< /Predictor 15

 /Columns 78
/BitsPerComponent 1>>
/FlateDecode filter def
 >> imagemask
Q
Q
}bind def 

You have not supplied either a file, procedure or string source as a value for the DataSource key in the dictionary. Essentially, the PostScript interpreter reads and tokenises the /DataSource key, and then proceeds to process the binary as PostScript. Unsurprisingly this causes an error 'syntaxerror in (binary token, type=156)' when processed with Ghostscript.

If you had got past that then you would have discovered that the filter operator takes a data source as well and you haven't supplied one for that either.

So you need to create a data source for your binary data. Up to you how you do that but currentfile is one way. Or readstring given that you know the string length.

So something like:

<<
  /ImageType 1
  /Width 29
  /Height 51
  /ImageMatrix [29 0 0 -51 0 51]
  /BitsPerComponent 1
  /Decode [1 0]
  /DataSource
  <length> string dup
  currentfile exch readstring
.....binary data.....
  <<
    /Predictor 15
    /Columns 29
  >> /FlateDecode filter
>> imagemask

Obviously you'll have to fill in yourself by knowing the string length. The dictionary argument to FlateDecode looks to me like it shouldn't be needed.

[Edit 4] I notice that this is appears to be intended for commercial use. Nothing wrong with that, but I'm not going to do all your homework for you, if its your job its up to you to learn the language well enough to do the job.

I'm skipping lightly over the actual implementation details below in an attempt to outline where you are going wrong. In practice things are a little more complex, I haven't discussed how the procedure stored in the CharStrings dictionary is created, or the difference with early name binding (which is an important concept in PostScript).

Your existing code is:

/g10135{
88  0  4  -70  82  8  setcachedevice 
q
[
0.01  0  0  0.01  0  0  ] M 
q
[7800  0  0  -7800  400  800  ]M 
<<
/ImageType 1
  /Width  78

  /Height  78

  /ImageMatrix [  78 0 0 -78 0 78]
  /BitsPerComponent  1

  /Decode [1
0]

  /DataSource   {417 string dup
 currentfile exch readstring}

...binary data....
<< /Predictor 15

 /Columns 78
>>/FlateDecode filter def
 >> imagemask
Q
Q
}bind def 

So, the PostScript interpreter reads those bytes one at a time, and converts them into tokens. This either results in an executable token, which is executed, or an operation on one of the stacks.

So /g10135 is terminated by the { character, because that's a reserved character. The / introduces a name object, so we end up with the name object g10135 which we push on to the operand stack. The { character introduces an executable array so we put a mark on the operand stack.

Next we read 88, terminated by a white space character. That's a numeric so we store that on the operand stack, likewise the other numbers. The operand stack now contains:

/g10135
mark
88
0
4
-70
82
8

We then read setcachedevice, which is terminated by a white space. That isn't a standard token so the interpreter starts looking through the dictionaries on the dictionary stack, looking for a definition. Since it is a standard operator, we find it in systemdict and execute it. That consumes 6 operands from the operand stack, it has no other effects (actually it does, but this is a bit special because we are executing inside a font, but we'll ignore that for now).

Next we encounter a q, again this is looked up in every dictionary on the dictionary stack to find a definition. This is defined in your own prolog as a gsave, so it takes no operands and returns no operands, it simply saves the graphics state, incrementing the save depth by 1.

I'm not going to go through the rest it would be tedious, however, eventually we reach your /DataSource, this is a name, so we push it on the operand stack. The next thing we encounter is a { that's a procedure definition so we push a mark on the operand stack. We then encounter a 417 so we push that, string, dup, currentfile, exch and readstring, so our stack looks like:

/DataSource
mark
417
string
dup
currentfile
exch
readstring

Then we get the character } That is the closing mark for an executable array, so we create the array and push it onto the operand stack:

/DataSource
{....}

Then we return to the procedure and continue executing it. The next thing we find is some binary data so we try to execute that as PostScript binary tokens. Because it isn't valid the interpreter throws an error.

Just creating an executable array is not sufficient to actually execute it. If you look at the outline code I posted at the end of edit 3 above you will note that I did not put the readstring and so on in an executable array, I simply allowed the interpreter to execute that code immediately.

By doing so the readstring acts on currentfile (the actual PostScript program in this case) and reads bytes of data from the current point in that file. The current point will be immediately after consuming the white space which terminates the readstring, ie the actual binary data. The readstring operator reads enough bytes from the file to fill the string, leaving the string on the operand stack. The file pointer has moved on to the byte after the binary data, and the interpreter resumes token scanning at that point. So it then creates the FilterParams dictionary puts the /FlateDecode name on the stack and then executes the filter operator which consumes the name, the dictionary and the string operands, returning a file object. That file object then becomes the value associated with the DataSource key in the image dictionary which is passed to the imagemask operator.

While I haven't tested that code, its basically correct. There are of course other ways to achieve the same aim.

That's basically about as far as I'm prepared to go with this, you need to go and look at what I've written and compare it with your own program.

Note that the simplest way to investigate this is to take the contents of the CharProc (excluding the setcachedevice) and just run that as a PostScript program.

KenS
  • 30,202
  • 3
  • 34
  • 51
  • The predictor *itself* is not a part of FlateDecode. PDF 32000-1:2008 says "[LZW and Flate] support optional transformation by a predictor function, which improves the compression of sampled image data." The key `/Predictor` is optional in the `LZWDecode` and `FlateDecode` stream descriptors. – Jongware Feb 01 '18 at 14:40
  • 1
    That's because the Predictor information is in the encoded data, so you don't need (for PNG predictors) to specify it. That's why I didn't say that it was required for the PostScript FlateDecode. But if you want to decompress it yourself, you'll need something which does understand the specific predictor function. And that's also why you need PostScript language level 3 for PNG predictors, earlier versions of the spec didn't allow for that. What the PDF reference says about it isn't entirely relevant, since we're discussing conversion into PostScript. – KenS Feb 01 '18 at 15:21
  • how can i write inline image stream as i can write data in <~ ~> or < > which shows ascii85 string or other but my data is raw data. how can i specify in datasource. data looks in pdf like this /DP<> ID xœUÒAkAðÿ¸bRÝŃ•R²iõn=L›)ö(ˆ÷‚=øZO ld”zìI¯ñØ£)YH%%ÁIÈAK6iÁÉ–Š]Ͷ¾™](ûÛÙyo÷½†«±ØD¶£°ü ¼®ÛS – Kbstar Feb 02 '18 at 04:06
  • Well that's up to you isn't it ? You can write it as Ascii85, in which case you will need to add that filter to the chain, or you can write a hex string, or you can have plain binary. You can put the data in a string, or just read it from currentfile. The choice is entirely yours. If you want to create a PostScript representation then you are going to have to write some PostScript. – KenS Feb 02 '18 at 08:12
  • i converted data to ascii85 and then applied ascii85decode filter and flate decode filter nothing shows in output any suggestions – Kbstar Feb 04 '18 at 12:00
  • Well no, there is no information there for me to work on. If you want help you are going to have to show some working, post some example files (simple ones) which show your input, and output. – KenS Feb 04 '18 at 12:42
  • I have written in the above answer what i am trying to do the only problem i am facing is how to represent data as r.aw data contains symbols like ( and / Kindly take a review at above one answer posted and give some suggestions – Kbstar Feb 06 '18 at 05:39
  • http://www.filedropper.com/in-un This is the input file link contaning glyph definations – Kbstar Feb 08 '18 at 04:12
  • https://ufile.io/lc0wj this is postscript file link only difficulty in it is how to paste the data here – Kbstar Feb 09 '18 at 05:06
  • https://ufile.io/c8jhk I have applied as you said. but it gives the same error. have a look at it .Thanks. – Kbstar Feb 09 '18 at 12:49
  • You haven't done what I said. You've put the example code in a procedure, but you have not **executed** the procedure, the result is that after reading the procedure (and leaving it on the operand stack, the interpreter still carries on to encounter the binary data, which it proceeds to try and interpret as PostScript, which of course causes exactly the same problem. The code I posted executes readstring on the current file, which reads the binary data (and that's why the string length needs to be correct). Are you compratively new to PostScript programming ? – KenS Feb 09 '18 at 13:43
  • Yes i am new to postscript. Can you please explain more how to execute this. But i think buildglyph method automatically executes all the glyphs encountered. – Kbstar Feb 09 '18 at 14:06
  • 1
    If you are new to PostScript then I doubt I can explain this much further. Yes BuildGlyh executes the CharProc, but that simply means that the PostScript interpreter executes the procedure stored in the CharStrings dictionary of the font. The procedure must still be valid, and yours isn't. The length of the string isn't the problem, its the fact that you have created a procedure. I'll try and explain more in the answer. – KenS Feb 09 '18 at 15:08
  • i have also removed the curly brackets that represent procedure but the error is same. I think there is problem with reading binary tokens for postscript – Kbstar Feb 12 '18 at 09:09
  • I have simplified the file. Now it contains only one glyph definition. Take a look at it i have one file in which glyph s working without image and one in which there are errors. – Kbstar Feb 12 '18 at 10:30
  • Waiting for your precious suggestion – Kbstar Feb 14 '18 at 09:10
  • Don't do it that way, put the content in a string and process that. Using currentfile is too complicated for this, you'd be better off not using it. – KenS Feb 14 '18 at 10:53
  • can you make the file valid i supplied to you – Kbstar Feb 14 '18 at 11:16
  • I can, certainly, since I'm a reasonably experienced PostScript programmer, but I'm not going to. You need to do your own homework. I've outlined several approaches, and taken the time to try and explain why simply dumping raw binary in the middle of the program isn't going to work. As I've said several times, create a string and store the data in that. You can use a Hex string or whatever if you prefer. That's essentially how we create this kind of output in Ghostscript's ps2write. – KenS Feb 14 '18 at 11:41
  • I have also tried storing data in string but it doesnt work – Kbstar Feb 14 '18 at 11:47
  • Then look at the ps2write output from the file, that works, and has a type 3 PostScript font with bitmap glyphs embedded in the output. – KenS Feb 14 '18 at 14:35
  • 1. https://ufile.io/5s5df 2. https://ufile.io/yn6be 3. https://ufile.io/09ke1 I have made three output files . all are valid and compiled but there is no ouput. moreover the number 2 file show fallicous outputfile. i have converted the binary data to hex using online hex converter tool. any suggestions. – Kbstar Feb 15 '18 at 05:12
  • When i distill the input pdf file with adobe acrobat the output ps file produced contains a different image data with lzw decode filter. what is the reason behind this. i am sending you the postscript file generated by adobe acrobat. https://ufile.io/x4v8x. Kindly take a look at it. Thanks – Kbstar Feb 15 '18 at 05:16
  • Take a look at it for the last time. sorry for bothering you again and again. – Kbstar Feb 15 '18 at 11:10
  • Brother waiting for your kind response @KenS – Kbstar Feb 19 '18 at 16:14
  • when converting with ghostscript. it also removes the image predictor. – Kbstar Apr 17 '18 at 10:52