I have to get core properties like author, keyword from Microsoft Office (open xml) document(docx, pptx, xlsx) through C++ code. I know there is open XML SDK for that but it solely designed for .NET. I found multiple links telling about the solution in .NET but I am looking for solution in C++. Is there any way in to read those core properties in C++ code?
1 Answers
The core properties are actually a pretty easy part to extract.
The first point to understand is that a docx, xlsx (etc.) file is really a zip file, with the extension changed. If you rename one to foo.zip, you can open the zip file and see that it's composed of a number of constituent files. At the top level, you'll find something on this general order:
[This particular one is an xlsx file, thus the xl
directory.]
If you look in the docProps directory, you'll find two files: app.xml and core.xml. It's probably not a huge surprise that core.xml contains the core properties, something like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dcterms:created xsi:type="dcterms:W3CDTF">2022-10-05T14:48:30Z</dcterms:created>
<dc:creator></dc:creator>
<dc:description></dc:description>
<dc:language>en-US</dc:language>
<cp:lastModifiedBy></cp:lastModifiedBy>
<dcterms:modified xsi:type="dcterms:W3CDTF">2022-10-05T14:49:24Z</dcterms:modified>
<cp:revision>1</cp:revision>
<dc:subject></dc:subject>
<dc:title></dc:title>
</cp:coreProperties>
So, basically, you'll want to use a library that knows how to read zip files. Extract <your_file>/docProps/core.xml
, and parse the xml you extracted to get your document's core properties (with the proviso that like in the example above, some of the fields may easily be missing).

- 476,176
- 80
- 629
- 1,111
-
I know this way but MS does not recommend manipulating open xml file directly. They have provided open xml library but is is solely designed for.NET. So I am looking for alternative to this open xml library in c++. If there is no such alternative then only option is to unzip and parse xml. Again I need to find respective libraries, include in my project. – dev Oct 06 '22 at 03:00
-
@dev: Questions asking for library recommendations are off-topic on SO. – Jerry Coffin Oct 06 '22 at 08:21
-
thanks for all the inputs. I think I will write .NET module to get core properties using open xml sdk and c++ will invoke that module as and when required. – dev Oct 06 '22 at 09:25