0

Current Problem :- I have a setup of GCS to which i am uploading some files such as doc, docx, pdf. With the file upload the default metadata is also getting uploaded. THE FILES ARE GETTING UPLOADED AS A Blob. When we try to access the file I am getting a InputStream from which we cannot delete the metadata directly.

What I want ? I want to delete the default metadata ( Which may reveal the personal info of uploaded users ) while uploading or downloading the file from GCS server.

What problems I am facing ? While downloading the file the file is in blog type, or I am getting the file as Input stream from which we cannot delete the metadata directly.

What steps I need to follow to remove the Metadata from the files while downloading and uploading ?

How can we read the file metadata from Input stream and delete it ?

Tools and programming languages used :- Kotlin, http4k, Apache POI, PDFBox

        val opc = OPCPackage.open("demoDox.docx")
        val pp = opc.packageProperties

        println(pp.creatorProperty)
        pp.setCreatorProperty("Shubham") //we can update the core properties like this
        println(pp.creatorProperty)
        opc.close()

We can remove the docx metadata only when we know the file path. But as of now I am getting a InputStream from GCS.

  • 2
    `OPCPackage` provides opening from `InputStream` too: [OPCPackage.open(java.io.InputStream in)](https://poi.apache.org/apidocs/dev/org/apache/poi/openxml4j/opc/OPCPackage.html#open-java.io.InputStream-). – Axel Richter Jun 21 '23 at 06:19

1 Answers1

0

Solved:

I was able to solve the problem using the code below:

val doc =  HWPFDocument(response.body.stream)
println("Current author  = ${doc.summaryInformation.author}")
            
val pp = doc.summaryInformation.removeAuthor()
     
println("Removed author = ${doc.summaryInformation.author}")

doc.close()
ahuemmer
  • 1,653
  • 9
  • 22
  • 29
  • If you just want to edit the Metadata on OLE2 documents, you don't even need to load at the `HWPF` level - just do it at the `POIFS` and `HPSF` level and it'll be simpler + less likely to make other unexpected changes – Gagravarr Jun 21 '23 at 10:43