-1

I have to get core properties like author, keyword from Microsoft Office (open xml) document(docx, pptx, xlsx) through C++ code. I know there is open XML SDK for that but it solely designed for .NET. I found multiple links telling about the solution in .NET but I am looking for solution in C++. Is there any way in to read those core properties in C++ code?

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
dev
  • 649
  • 9
  • 11

1 Answers1

1

The core properties are actually a pretty easy part to extract.

The first point to understand is that a docx, xlsx (etc.) file is really a zip file, with the extension changed. If you rename one to foo.zip, you can open the zip file and see that it's composed of a number of constituent files. At the top level, you'll find something on this general order:

enter image description here

[This particular one is an xlsx file, thus the xl directory.]

If you look in the docProps directory, you'll find two files: app.xml and core.xml. It's probably not a huge surprise that core.xml contains the core properties, something like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <dcterms:created xsi:type="dcterms:W3CDTF">2022-10-05T14:48:30Z</dcterms:created>
    <dc:creator></dc:creator>
    <dc:description></dc:description>
    <dc:language>en-US</dc:language>
    <cp:lastModifiedBy></cp:lastModifiedBy>
    <dcterms:modified xsi:type="dcterms:W3CDTF">2022-10-05T14:49:24Z</dcterms:modified>
    <cp:revision>1</cp:revision>
    <dc:subject></dc:subject>
    <dc:title></dc:title>
</cp:coreProperties>

So, basically, you'll want to use a library that knows how to read zip files. Extract <your_file>/docProps/core.xml, and parse the xml you extracted to get your document's core properties (with the proviso that like in the example above, some of the fields may easily be missing).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • I know this way but MS does not recommend manipulating open xml file directly. They have provided open xml library but is is solely designed for.NET. So I am looking for alternative to this open xml library in c++. If there is no such alternative then only option is to unzip and parse xml. Again I need to find respective libraries, include in my project. – dev Oct 06 '22 at 03:00
  • @dev: Questions asking for library recommendations are off-topic on SO. – Jerry Coffin Oct 06 '22 at 08:21
  • thanks for all the inputs. I think I will write .NET module to get core properties using open xml sdk and c++ will invoke that module as and when required. – dev Oct 06 '22 at 09:25