2

I've just stumbled across this, it looks extremely useful. I found some examples for manipulating slides and the like, my particular use case involves basically replacing a bunch of images within a given presentation with different image files, but I want to retain most of the metadata such as position, size, etc.

I suppose the question is a little more generic in nature, more like "what's the logical flow of this within the python-pptx framework?". Simply replacing the file pointer with the new one misses the mark for sure, but it's not obvious to me whether there are attributes for pictures that could be easily stored and re-applied, or what other approaches might make the most sense to have the code be easier to work with down the road...

any suggestions appreciated ;-)

Update: Attempted the following assignment of _blob but it appears to not be working, or maybe I'm missing something easy?

#!/usr/bin/env python3

import pptx
import hashlib

prs = pptx.Presentation('hack.pptx')
newImgFilename = "gray.jpg"
img2 = pptx.parts.image.Image.from_file(newImgFilename)

print(hashlib.sha224(prs.slides[0].shapes[2].image._blob).hexdigest())
print(hashlib.sha224(img2._blob).hexdigest())
### these two should be different


prs.slides[0].shapes[2].image._blob = img2._blob
print(hashlib.sha224(prs.slides[0].shapes[2].image._blob).hexdigest())
### now this should be the value from img2, but it's not... 

Update Jan 2023 (working code):

#!/usr/bin/python3

import pptx

smallfile = "small.jpg"

# open presentation
prs = pptx.Presentation('test.pptx')

# create new image part from new image file
new_pptx_img = pptx.parts.image.Image.from_file(smallfile)

# obviously have to figure out what image you're actually changing...
img_shape = prs.slides[0].shapes[0]  

# get part and rId from shape we need to change
slide_part, rId = img_shape.part, img_shape._element.blip_rId
image_part = slide_part.related_part(rId)

# overwrite old blob info with new blob info
image_part.blob = new_pptx_img._blob

# save it
prs.save('changed.pptx')
ljwobker
  • 832
  • 2
  • 10
  • 20

1 Answers1

3

A .pptx file is a zip-archive. It's format is specified by the Open Packaging Convention (OPC), as are .docx and .xlsx files. In OPC parlance, the zip-archive is known as a "package".

The bytes of an image "file" that appears in a presentation are stored in the .pptx package as a distinct "member" of the zip-archive, probably at a path like ppt/media/image1.png. It shouldn't take too much snooping around to find it in there.

The rest of the information used to display the image, like position and size, are stored elsewhere. So you can get a certain way down the road just by replacing the existing image bytes by new image bytes.

There are a few challenges you can anticipate.

  1. You need to identify which image member (e.g. ppt/media/image42.png) goes with which picture shape on which slide.

  2. If the aspect ratio is not exactly the same, the resulting picture will appear "stretched" in one dimension or the other.

In general, you can attack the problem by manipulating the zip archive or by letting python-pptx take you as far as it can and then delving into internals to go the rest of the way.

If you use python-pptx to get a reference to the picture shape, picture.image will give you the Image object for the image it contains. The code for that class is here: https://github.com/scanny/python-pptx/blob/master/pptx/parts/image.py#L139

I would try assigning the new image bytes to Image._blob then saving and see what happens. The size and position of the picture shape you used to get there can be adjusted to suit the new aspect ratio if necessary and going this route lets python-pptx take care of all the packaging details like which image file in the package is changed and so forth.

After that you'll need to address any additional challenges by understanding how the existing code works and see what you can do from there. You can ask new questions as they come up if you go that route.


UPDATE: Okay, looks like the Image._blob item doesn't get written, it needs to be ImagePart._blob (Image._blob is just a read-only "copy", roughly speaking).

shape = {picture shape of interest}
slide_part, rId = shape.part, shape._element.blip_rId
image_part = slide_part.related_parts[rId]
image_part.blob = new_blob
scanny
  • 26,423
  • 5
  • 54
  • 80
  • This is great. I'm familiar with the XML/ZIP file structure, searching that makes it easy enough to find the big image files I'm interested in. Directly manipulating the XML file failed, as it appears there are checksums and all sorts of other things that got calculated somewhere. Hopefully the part about switching the image bytes and then re-saving will keep the metadata intact. – ljwobker Jul 05 '21 at 03:06
  • @ljwobker try update I added at end of answer. – scanny Jul 13 '21 at 23:33
  • Yes, thanks!!! ... current python-pptx now requires small change described here: https://stackoverflow.com/questions/70159390/python-pptx-library-issue-replacing-image-in-slides-slidepart-object-has-no – ljwobker Jan 14 '23 at 19:55