When a Word document contains an embedded Office document it creates a media file to display the name and logo of the document and it embeds the Office document. I am failing to relate the media file to the document using OpenXML.
I can get the embedded documents and media files using the following code, but I cannot see any relationship between the two from the class members.
Private Shared Function ExtractStream(source As Document, Stream As IO.Stream, format As DocumentDataFormat) As DocumentList
Dim Documents As New DocumentList()
Const embeddingPartString As String = "/word/embeddings/"
Const mediaPartString As String = "/word/media/"
Using WordDoc = WordprocessingDocument.Open(Stream, False)
Dim intDocumentIndex As Int32 = 1
' EmbeddedPackagePart - These are the Office 2007+ type documents
For Each pkgPart In WordDoc.MainDocumentPart.GetPartsOfType(Of EmbeddedPackagePart)
If pkgPart.Uri.ToString.StartsWith(embeddingPartString) Then
Dim fileName1 As String
fileName1 = pkgPart.Uri.ToString.Remove(0, embeddingPartString.Length)
Dim Doc As Document = ReadOffice(source, pkgPart, format, intDocumentIndex)
If (Doc IsNot Nothing) Then
Documents.Add(Doc)
intDocumentIndex += 1
End If
End If
Next
For Each pkgPart In WordDoc.MainDocumentPart.GetPartsOfType(Of ImagePart)
' Media files
If pkgPart.Uri.ToString.StartsWith(mediaPartString) Then
Dim fileName1 As String
fileName1 = pkgPart.Uri.ToString.Remove(0, mediaPartString.Length)
Dim Doc As Document = ReadMedia(source, pkgPart, format, intDocumentIndex)
If (Doc IsNot Nothing) Then
Documents.Add(Doc)
intDocumentIndex += 1
End If
End If
Next
End Using
Return Documents
End Function
I can get relationships between a media file and an embedded Ole Object if I walk the child elements of the WordDoc.MainDocumentPart.Document
class. When I find an Ole Object I look for a sibling Shape XmlElement in the parent’s ChildElements collection.
This is fine, except I cannot figure out how to get the embedded files from these classes.
' Start with the Document class.
LookForOleObjects(WordDoc.MainDocumentPart.Document)
Private Shared Sub LookForOleObjects(elem As DocumentFormat.OpenXml.OpenXmlElement)
If elem Is Nothing Then Return
Dim ole = TryCast(elem, DocumentFormat.OpenXml.Vml.Office.OleObject)
If (ole IsNot Nothing) Then
' found one.
Dim img = GetImageFile(ole)
If img IsNot Nothing Then
' found the image for the ole object
End If
End If
If (elem.ChildElements IsNot Nothing) Then
For Each child In elem.ChildElements
LookForOleObjects(child)
Next
End If
End Sub
Private Shared Function GetImageFile(ole As OleObject) As DocumentFormat.OpenXml.Vml.ImageData
Dim p As DocumentFormat.OpenXml.OpenXmlElement = ole.Parent
For Each child In p.ChildElements
Dim shape As DocumentFormat.OpenXml.Vml.Shape = TryCast(child, DocumentFormat.OpenXml.Vml.Shape)
If shape IsNot Nothing Then
Dim Img As DocumentFormat.OpenXml.Vml.ImageData = TryCast(shape.ChildElements(0), DocumentFormat.OpenXml.Vml.ImageData)
If Img IsNot Nothing Then Return Img
End If
Next
Return Nothing
End Function
I have two approach but each one is lacking. The first get the embedded document but not the relationships, and the second gets the relationships but not the embedded documents. What am I Missing? How can get the embedded document and its related medial file using OpenXML?