4

I'm trying to extract some MathML metadata from EPS or GIFs formats (which are the only two formats I can export equations). I only know the basics of Python so I was searching for a library which would probably have a method that gives me the metadata from the EPS just as you would read it if you open it on a Text editor. Could somebody advise on what I can do to get that?

I've tried the PIL and EXIF python packages but until know I only seem to get binary information which I don't know how to decode.

from exif import Image

file = "myfile.gif"
f = open(file, 'rb')
myeq = Image(f)
myeq.get_file

I got this:

bound method Image.get_file of <exif._image.Image object at 0x033C3750>

I would like something like this:

GIF89af . ð  ÿÿÿ   ,    f .  ñ„©Ëí£œ´Ú‹³Þ¼÷ †âH–æyz¨¶.žò¬
NLç˜Íð¼œüÃ`×úЦ0èDD8炤][©7aëR¹À®ØðcŠ.á2·˜º5ß
4¸ÝÌ*ícIÚ·‘öCxSB6×W3$˜Øsèˆ(Ù˜Á˜TCÇ'¹iqhÉÉ   &§IY7   ¸’·“ú×ÒÚf
£÷y‡zƒÆREk´›û+¼8\ÜiŒœ¬¼ÌܼŒ-=M]=⌭½ÍÝíý
n$K.óy¥^.…ÇîñéìLOô¥‰¿3ºË_ÉËP Å]7¯ Â…»  !ÿ
MathType001ÿ DSMT7 WinAllBasicCodePages Times New Roman 
Symbol Courier Prime MT Extra  !/ED/APôG_APòAPô
A ôEô%ôB_A ôC_A ôEô*_HôA ô@ôAHôA*_D_Eô_Eô_A  

     †"- ƒb †± ± 
  ƒb      ˆ2   
†"'- ˆ4  ƒa  ƒc    
  ˆ2  ƒa      !ÿMathType003ÿ<?xml version="1.0"?><!--
MathType@Translator@5@5@MathML2 (Clipboard).tdl@MathML 2.0
(Clipboard)@ --><math xmlns='http://www.w3.org/1998/Math/MathML'
<mrow><mfrac><mrow><mo>&#x2212;</mo><mi>b</mi><mo>&#x00B1;</mo><msqrt>
<mrow><msup><mi>b</mi><mn>2</mn></mšsup><mo>&#x2212;</mo><mn>4</mn>
<mi>a</mi><mi>c</mi></mrow></msqrt></mrow><mrow><mn>2</mn><mi>a</mi>
</mrow></mfrac></mrow></math><!-- MathType@End@5@5@ --> !ÿ
MathType002;

Here is the EPS in case is useful:

https://drive.google.com/open?id=1Y7WrJK1gmpvqeboFY3OmPYGJqkRptKTY

  • 1
    I don't know what metadata you think is in the EPS file. Whatver is there, it will be in PostScript comments; lines beginning with a '%' and ending with a line terminator. Note that binary data (eg bitmpas) can contain lines that look like that. The only reliable way to parse comments out of EPS is to use a full PostScript interpreter (eg Ghostscript). However, I've no idea what you are expecting to find, perhaps you could post an example. – KenS Sep 10 '19 at 15:39
  • Thank you for the comment @KenS. Well basically when I open my equation exported as an EPS in a text editor (like notepad), I get the last block of code I posted, the one with the MathML information at the end. Probably this is saved as a PostScript comment as you said, I don't know, but that is what I mean by metadata. I want to extract the MathML. I would like to do it using Python, but maybe is not possible. What do you think? – Pablo Guerrero Sep 10 '19 at 17:19
  • You can certainly use Python (or any other programming language) to read the PostScript. Its **much** easier to find comments at the start of the file, the end is harder. Probably the easiest solution is to seek to the end of hte file, then read backwards until you find a EOL (CR or LF), then check teh next character to see if its a comment. That's assuming its a single lien comment. Comments are introduced by a '%' character and terminated by EOL, but since PostScript can contain binary sequences, the only guaranteed method is to use a PostScript interpreter. – KenS Sep 15 '19 at 13:56
  • 1
    If the file is DSC compliant (and an EPS should be) then it may be you could use Ghostscript to read the comments for you, and discard the ones you don't want. If you must use pure Python then it should be OK, as long as you always get EPS from the same package (it'll be machine-generated, so you can rely on its format). In that case just use the method outlined above. – KenS Sep 15 '19 at 13:57

0 Answers0