0

How to read images in the ms-office .doc file using Apache poi? I have tried with the following code but it is not working.

try {
    POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("C:\\DATASTORE\\ImageDocument.doc"));
    Document document = new Document();
    OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/ImageDocumentPDF.pdf"));
    PdfWriter.getInstance(document, fileOutput);
    document.open();

    HWPFDocument hdocument=new HWPFDocument(fs);
    Range range=hdocument.getOverallRange();
    PdfPTable createTable;
    CharacterRun run;
    PicturesTable picture=hdocument.getPicturesTable();
    int picoffset=run.getPicOffset();
    for(int i=0;i<range.numParagraphs();i++) {
        run =range.getCharacterRun(i);
        if(picture.hasPicture(run)) {
            Picture pic=picture.extractPicture(run, true);
            byte[] picturearray=pic.getContent();
            com.itextpdf.text.Image image=com.itextpdf.text.Image.getInstance(picturearray);
            document.add(image);
        }
    }
}

When i execute the above code and prints the picture offset value it displays -1 and when print picture.hasPicture(run) it returns false though the input file has an image.

Please help me to find the solution. Thank you

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
nagesh
  • 307
  • 2
  • 10
  • 22

2 Answers2

2
public static List<byte[]> extractImagesFromWord(File file) {
    if (file.exists()) {
        try {
            List<byte[]> result  = new ArrayList<byte[]>();
            if ("docx".equals(getMimeType(file).getExtension())) {
                org.apache.poi.xwpf.usermodel.XWPFDocument doc = new XWPFDocument(new FileInputStream(file));
                for (org.apache.poi.xwpf.usermodel.XWPFPictureData picture : doc.getAllPictures()) {
                    result.add(picture.getData());
                }
            } else if ("doc".equals(getMimeType(file).getExtension())) {
                org.apache.poi.hwpf.HWPFDocument doc = new HWPFDocument(new FileInputStream(file));
                for (org.apache.poi.hwpf.usermodel.Picture picture : doc.getPicturesTable().getAllPictures()) {
                    result.add(picture.getContent());
                }
            }
            return result;
        } catch (Exception e) {
            throw new RuntimeException( e);
        }
    }
    return null;
}
speedee
  • 21
  • 3
0

it worked for me, if picOffset returns -1, it means there is no image for current CharacterRun

Robert
  • 5,278
  • 43
  • 65
  • 115
Nurlan
  • 720
  • 7
  • 12