4

I need to convert bunch of powerpoint files into xml format.

I have to change the all fonts in about 100 powerpoint files, without opening the files. There are several shape types in each slide and each might have a different font. I used python-pptx package and wrote the some code to change the fonts of all texts in a powerpoint presentation. But, it turned out that it does not work for all languages (see change all fonts in powerpoint without opening the file for more details).

I saved the powerpoint file as xml and then changed all fonts in there. Then, when I open the file with powerpoint it works OK, all fonts are changed!. Now, I am trying to save all those powerpoint files as .xml using a python code. I seared to see if python-pptx provides such a functionality or not, and I could not find anything.

============================= update=== I used opc extract and I got:

` opc extract .\f10.pptx
usage: opc extract [-h] PKG_PATH DIRPATH
opc extract: error: the following arguments are required: DIRPATH
(base) PS C:\Users\a_oro\Downloads\pptxfile> opc extract f10.pptx .
Traceback (most recent call last):
  File "c:\users\a_oro\miniconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\a_oro\miniconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\a_oro\Miniconda3\Scripts\opc.exe\__main__.py", line 7, in <module>
  File "c:\users\a_oro\miniconda3\lib\site-packages\opcdiag\cli.py", line 304, in main
    command_controller.execute(argv)
  File "c:\users\a_oro\miniconda3\lib\site-packages\opcdiag\cli.py", line 53, in execute
    command.execute(args, self._app_controller)
  File "c:\users\a_oro\miniconda3\lib\site-packages\opcdiag\cli.py", line 228, in execute
    app_controller.extract_package(args.pkg_path, args.dirpath)
  File "c:\users\a_oro\miniconda3\lib\site-packages\opcdiag\controller.py", line 66, in extract_package
    package.prettify_xml()
  File "c:\users\a_oro\miniconda3\lib\site-packages\opcdiag\model.py", line 54, in prettify_xml
    for pkg_item in self._pkg_items.itervalues():
AttributeError: 'dict' object has no attribute 'itervalues' 

I appreciate any help or comment.

Afshin Oroojlooy
  • 1,326
  • 3
  • 21
  • 43

2 Answers2

1

There is a companion package called opc-diag which can help with parts of this. You install it with:

pip install opc-diag

And then from the command-line you can:

opc extract PPTXFILE DIRECTORY

This "unpackages" the .pptx "package" into its component parts (files), most of which are XML. It also reformats them for easy editing instead of "the whole file on one line" format in which they are stored by PowerPoint.

So then you can do some global edit with sed I suppose or whatever you decide.

Then you can do:

opc repackage DIRECTORY PPTX-FILE

and you get a loadable .pptx file again, after your changes.

So putting it all together:

opc extract my.pptx working_dir
# --- run editing script or edit files by hand ---
opc repackage working_dir my_edited.pptx
scanny
  • 26,423
  • 5
  • 54
  • 80
  • Thanks!. I'll give it a try and let you know if it worked. – Afshin Oroojlooy Mar 14 '21 at 03:33
  • It did not work for me. I updated the question with the error. – Afshin Oroojlooy Mar 14 '21 at 15:58
  • Anyways, I was able to extract the details by following instruction in https://support.microsoft.com/en-us/office/extract-files-or-objects-from-a-powerpoint-file-85511e6f-9e76-41ad-8424-eab8a5bbc517 It results in a bunch of directories wit hall elements of the `pptx` files. Does `opc` provide the same files? – Afshin Oroojlooy Mar 14 '21 at 15:59
  • 2
    Ah, it looks like `opc-diag` has a Python 3 incompatibility. I'll have to see about fixing that. In any case, yes, that command extracts the very same files, it just also reformats them so they are more suitable for editing by hand rather than machine. – scanny Mar 15 '21 at 17:15
  • Is `opc` only windows based? Can I call it in Linux? – Afshin Oroojlooy Mar 22 '21 at 02:28
  • No, it works anywhere Python does. You can definitely use it on Linux. – scanny Mar 22 '21 at 05:28
  • I was trying to use in ubuntu for Windows, and it seems that it installs it in `/home/username/.local/bin/opc`, and it does not recognize `opc` by default. – Afshin Oroojlooy Mar 28 '21 at 14:29
  • I still have the issue with `python3`. I installed python2 and installed `opc-diag` on `python2`, followed the instruction to extract the files, edited the `theme.xml` files, repackged by `opc` and still get error `powerpoint found a problem with content in "\address-of-file" powerpoint can attempt to repair the presentation. if you trust the source of this presentation click repair `. Any idea? – Afshin Oroojlooy Mar 28 '21 at 14:58
  • Does it work if you don't edit the `theme.xml` files but just extract and repackage? If so, then you made an invalid edit to `theme.xml`. – scanny Mar 28 '21 at 20:54
  • With no edit it is fine, as modification of `pptx` to `zip` and `zip` to `pptx` was fine. – Afshin Oroojlooy Mar 28 '21 at 23:11
0

Adding onto @scanny from above to work in Python 3.10

pip install opc-diag

#in your script you can do the following:
import subprocess
ppt_file = 'my_powerpoint.pptx'
save_dir = '/yoursavefolder'
subprocess.call(f'opc extract {ppt_file} {save_dir}')
#do stuff with extracted files.  The slide data is under save_dir/ppt/slides/slide1.xml

new_ppt_name = 'new.pptx'
subprocess.call(f'opc repackage {save_dir} {new_ppt_name}')

If you run this, it will break with the same error.

File "c:\users\a_oro\miniconda3\lib\site-packages\opcdiag\model.py", line 54, in prettify_xml for pkg_item in self._pkg_items.itervalues(): AttributeError: 'dict' object has no attribute 'itervalues'

You simply need to go that directory: c:\users\a_oro\miniconda3\lib\site-packages\opcdiag\model.py, and change lane 54 from:

for pkg_item in self._pkg_items.itervalues():

to

for pkg_item in self._pkg_items.values():

and this should solve the error and allow it to be ran in python 3.10. I've come across this issue before for other packages.