Using PyMuPDF, I want to extract all images from pdf and save them separately and replace all images in pdf with just their image names at the same image place and save as another document. I can save all images with following code.
import fitz
#This creates the Document object doc
doc = fitz.open("Article_Example_1_2.pdf")
html_text=""
for i in range(len(doc)):
print(doc[i]._getContents())
for img in doc.getPageImageList(i):
xref = img[0]
pix = fitz.Pixmap(doc, xref)
if pix.n - pix.alpha < 4: # this is GRAY or RGB or pix.n < 5
pix.writePNG("p%s-%s.png" % (i, xref))
else: # CMYK: convert to RGB first
pix1 = fitz.Pixmap(fitz.csRGB, pix)
pix1.writePNG("p%s-%s.png" % (i, xref))
pix1 = None
pix = None
doc.save(filename=r"new.pdf")
doc.close()
but not sure how to replace them all in pdf with their stored images names. Would greatly appreciate if anyone can help me out here.