0

Updated - I am working on retrieving data from a large number of Excel workbooks using C#. There are some important pdf documents that are embedded in the workbooks. I need to save them as individual document for further processing.

I am able to loop through all oleObject in all worksheets and find all pdfs.

I used progID in DocumentFormat.OpenXml.Spreadsheet to identify the pdfs https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.oleobjects?view=openxml-2.8.1

  foreach(Worksheet ws in xlWb.Worksheets)
        {      
            foreach (OLEObject ole in ws.OLEObjects())
            {
                  //identify whether the oleObject is of AcroExch class type
                   if(ole.progID == "AcroExch.Document.DC")
                {
                   //2. Cast oleObject to AcroExch and save it as a pdf separately 
                }


            }
        }

From what I gathered online, using acrobat dc sdk seems to be the only option. Is there any other way to achieve what I want?

Thanks

  • I believe the `OLEObject.Creator` property can be used to indicate Acrobat PDF files. I don't know what 32-bit value is for Acrobat, specifically, however. – Dai Aug 12 '19 at 00:52
  • https://stackoverflow.com/questions/52778729/download-embedded-pdf-file-in-excel there is some code in this thread, perhaps that will be helpful – Allen King Aug 12 '19 at 01:32
  • @Dai OLEObject.Creator does not work. I imported DocumentFormat.OpenXml.Spreadsheet and use if(ole.progID == "AcroExch.Document.DC") to identify pdf – mulder89520 Aug 14 '19 at 06:56
  • @AllenKing thanks. I ended up using the code from this thread https://stackoverflow.com/questions/22358982/how-to-download-embedded-pdf-files-in-an-excel-worksheet – mulder89520 Aug 14 '19 at 06:57

1 Answers1

0

To extract embedded pdf and save to pdf, please refer to this solution provided by GemBox Dev Team:
How to download embedded PDF files in an excel worksheet?

Mario Z
  • 4,328
  • 2
  • 24
  • 38