I created a PDF extract program using TKinter, PYPDF2, and PIL by following a tutorial. This is the image extraction code
def extract_images(page):
images = []
if '/XObject' in page['/Resources']:
xObject = page['/Resources']['/XObject'].getObject()
for obj in xObject:
if xObject[obj]['/Subtype'] == '/Image':
size = (xObject[obj]['/Width'], xObject[obj]['/Height'])
data = xObject[obj].getData()
mode = ""
if xObject[obj]['/ColorSpace'] == '/DeviceRGB':
mode = "RGB"
else:
mode = "CMYK"
img = Image.frombytes(mode, size, data)
images.append(img)
else:
img = Image.new("RGB", (100, 100), (255, 255, 255))
images.append(img)
return images
It worked with the provided test files, but no other pdf, usually giving the error
raise NotImplementedError("unsupported filter %s" % filterType) NotImplementedError: unsupported filter /DCTDecode
I've tried changing the code, but I simply cannot find a solution