0

I need to change the all fonts in about 100 powerpoint files, without opening the files. So, I am looking for some APIs to call in some python or C++ code to perform the change from command line in Linux. There are several shape types in each slide and each might have a different font. I have followed three approaches:

1- Using python-pptx: I used python-pptx package and wrote the some code to change the fonts of all texts in a powerpoint presentation. But, it turned out that it does not work for all languages (see change all fonts in powerpoint without opening the file for more details).

2- I also converted the powerpoint file into the underlying xml-directory format by manually changing the type of the file from pptx into zip, as instructed in Microsoft page (https://support.microsoft.com/en-us/office/extract-files-or-objects-from-a-powerpoint-file-85511e6f-9e76-41ad-8424-eab8a5bbc517). I unzipped the file, edited the xml files, changed all fonts in there, and saved it back to pptx, but it does not work by giving a corrupted file. It seems that there are other files which I need to edit and match with the edited file to make it work, which I do not know what are those files.

3- I exported a single file into xml format by save-as through powerpoint, modified the fonts, and then converted it back to pptx, and it works OK. But, I am not sure how can I convert all the files into xml format, without opening them one by one.

I appreciate any help or comment.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I tried installing the latest version of opc-diag by the provided answer by @scanny. It resulted in:

pip install https://github.com/python-openxml/opc-diag/archive/develop.zip
Collecting https://github.com/python-openxml/opc-diag/archive/develop.zip
  Downloading https://github.com/python-openxml/opc-diag/archive/develop.zip
     - 149 kB 4.6 MB/s
    ERROR: Command errored out with exit status 1:
     command: /bigdisk/lax/rl/anaconda3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-iegl1onm/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-iegl1onm/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-lid34ggv
         cwd: /tmp/pip-req-build-iegl1onm/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-iegl1onm/setup.py", line 72, in <module>
        VERSION = re.search("__version__ = '([^']+)'", init_py).group(1)
    AttributeError: 'NoneType' object has no attribute 'group'
    ----------------------------------------
WARNING: Discarding https://github.com/python-openxml/opc-diag/archive/develop.zip. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Afshin Oroojlooy
  • 1,326
  • 3
  • 21
  • 43

1 Answers1

1

opc-diag is your friend for safely extracting and repackaging OPC (PPTX, DOCX, XLSX) packages. Try installing from the develop branch which has been updated to work with Python 3:

pip install https://github.com/python-openxml/opc-diag/archive/develop.zip

And use the extract subcommand to extract and the repackage subcommand to put it back into proper form. More details are in the documentation here: https://opc-diag.readthedocs.io/en/latest/. There are some subtleties in repackaging that simply "re-zipping" might not accomplish.

Start by just extracting and immediately repackaging without making any edits. That will tell you whether the problem is in the repackaging or some edit you made.

If no-edit repackaging works and you're getting errors after edits, then you must be making an edit that produces invalid XML. Note that the sequence of child-elements within most XML elements in a PPTX file is significant. Typically, the fastest way to work out what that sequence is is by creating an example file by hand (using PowerPoint), and then inspecting the XML it produces when that file is saved. A more rigorous approach is to consult the XML schema for PowerPoint.

scanny
  • 26,423
  • 5
  • 54
  • 80
  • I tried to install the version that you mentioned, but resulted in the error that I added in the question. – Afshin Oroojlooy Mar 28 '21 at 13:35
  • That appears to have been an error with your edit, not with `opc-diag`'s extraction or repackaging. – scanny Mar 28 '21 at 20:57
  • I only replaced some font names. When I do same replacement on the single `xml` file (which is obtained by save as `xml` by Powerpoint) it works OK. So, the modification itself is not problematic I think. Besides, when I perform the same modification with edit the zip file with `7-zip`, it is OK and the fonts are replaced in the `pptx` file. – Afshin Oroojlooy Mar 28 '21 at 23:53