1

When a Word document contains an embedded Office document it creates a media file to display the name and logo of the document and it embeds the Office document. I am failing to relate the media file to the document using OpenXML.

I can get the embedded documents and media files using the following code, but I cannot see any relationship between the two from the class members.

Private Shared Function ExtractStream(source As Document, Stream As IO.Stream, format As DocumentDataFormat) As DocumentList
    Dim Documents As New DocumentList()
    Const embeddingPartString As String = "/word/embeddings/"
    Const mediaPartString As String = "/word/media/"
    Using WordDoc = WordprocessingDocument.Open(Stream, False)

        Dim intDocumentIndex As Int32 = 1
        ' EmbeddedPackagePart - These are the Office 2007+ type documents
        For Each pkgPart In WordDoc.MainDocumentPart.GetPartsOfType(Of EmbeddedPackagePart)
            If pkgPart.Uri.ToString.StartsWith(embeddingPartString) Then
                Dim fileName1 As String
                fileName1 = pkgPart.Uri.ToString.Remove(0, embeddingPartString.Length)
                Dim Doc As Document = ReadOffice(source, pkgPart, format, intDocumentIndex)
                If (Doc IsNot Nothing) Then
                    Documents.Add(Doc)
                    intDocumentIndex += 1

                End If
            End If
        Next

        For Each pkgPart In WordDoc.MainDocumentPart.GetPartsOfType(Of ImagePart)
            ' Media files
            If pkgPart.Uri.ToString.StartsWith(mediaPartString) Then
                Dim fileName1 As String
                fileName1 = pkgPart.Uri.ToString.Remove(0, mediaPartString.Length)
                Dim Doc As Document = ReadMedia(source, pkgPart, format, intDocumentIndex)
                If (Doc IsNot Nothing) Then
                    Documents.Add(Doc)
                    intDocumentIndex += 1
                End If

            End If
        Next
    End Using
    Return Documents

End Function

I can get relationships between a media file and an embedded Ole Object if I walk the child elements of the WordDoc.MainDocumentPart.Document class. When I find an Ole Object I look for a sibling Shape XmlElement in the parent’s ChildElements collection.

This is fine, except I cannot figure out how to get the embedded files from these classes.

    ' Start with the Document class.
LookForOleObjects(WordDoc.MainDocumentPart.Document)


Private Shared Sub LookForOleObjects(elem As DocumentFormat.OpenXml.OpenXmlElement)
    If elem Is Nothing Then Return

    Dim ole = TryCast(elem, DocumentFormat.OpenXml.Vml.Office.OleObject)
    If (ole IsNot Nothing) Then
        ' found one.
        Dim img = GetImageFile(ole)
        If img IsNot Nothing Then
            ' found the image for the ole object
        End If
    End If
    If (elem.ChildElements IsNot Nothing) Then
        For Each child In elem.ChildElements
            LookForOleObjects(child)
        Next
    End If

End Sub

Private Shared Function GetImageFile(ole As OleObject) As DocumentFormat.OpenXml.Vml.ImageData
    Dim p As DocumentFormat.OpenXml.OpenXmlElement = ole.Parent
    For Each child In p.ChildElements
        Dim shape As DocumentFormat.OpenXml.Vml.Shape = TryCast(child, DocumentFormat.OpenXml.Vml.Shape)
        If shape IsNot Nothing Then
            Dim Img As DocumentFormat.OpenXml.Vml.ImageData = TryCast(shape.ChildElements(0), DocumentFormat.OpenXml.Vml.ImageData)
            If Img IsNot Nothing Then Return Img
        End If
    Next
    Return Nothing
End Function

I have two approach but each one is lacking. The first get the embedded document but not the relationships, and the second gets the relationships but not the embedded documents. What am I Missing? How can get the embedded document and its related medial file using OpenXML?

Gridly
  • 938
  • 11
  • 13

0 Answers0