I found there are some library for extracting images from PDF or word, like docx2txt and pdfimages. But how can I get the content around the images (like there may be a title below the image)? Or get a page number of each image?
Some other tools like PyPDF2 and minecart can extract image page by page. However, I cannot run those code successfully.
Is there a good way to get some information of the images? (from the image got from docx2txt or pdfimages, or another way to extract image with info)