6

END GOAL: Efficiently (in one pass) read all CellRecords on a huge (30,000+ row), protected Worksheet.

Problem: Using the HSSF.EventUserModel, how can I read all Records (including CellRecords) for an XLS file with both Workbook and Worksheet protection?

Create Input Spreadsheet (in Excel 2010):

  1. Create new Blank workbook.
  2. Set value of A1 to number: 50
  3. Set value of A2 to string: fifty
  4. Set value of A3 to formula: =25*2
  5. Review (ribbon) -> Protect Sheet -> Password: pass1
  6. Review (ribbon) -> Protect Workbook -> Password: pass1
  7. File (ribbon) ->Save As... -> Save as type: Excel 97-2003 Workbook

Progress thus far:

  • The XLS file opens without a password in Excel. Therefore, you shouldn't need the password to open it in POI.
  • The XLS file opens successfully with new HSSFWorkbook(Stream fs). However, I need the efficiency of EventUserModel for my actual spreadsheet.
  • Setting NPOI.HSSF.Record.Crypto.Biff8EncryptionKey.CurrentUserPassword = "pass1"; did not work.
  • The ProcessRecord( ) function catches a PasswordRecord, but I can't find any documentation on how to properly handle it.
  • Perhaps, the EncryptionInfo or Decryptor classes may be of some use.

Note:
I'm using NPOI. However, I can translate any java examples to C#.

Code:
I use the following code to capture Record events. My Book1-unprotected.xls (without protection) shows all Record events (including cell values). My Book1-protected.xls displays some records and throws an exception.

I just view processedEvents in the debugger.

using System;
using System.Collections.Generic;
using System.IO;

using NPOI.HSSF.Record;
using NPOI.HSSF.Model;
using NPOI.HSSF.UserModel;
using NPOI.HSSF.EventUserModel;
using NPOI.POIFS;
using NPOI.POIFS.FileSystem;

namespace NPOI_small {
    class myListener : IHSSFListener {
        List<Record> processedRecords;

        private Stream fs;

        public myListener(Stream fs) {
            processedRecords = new List<Record>();
            this.fs = fs;

            HSSFEventFactory factory = new HSSFEventFactory();
            HSSFRequest request = new HSSFRequest();

            MissingRecordAwareHSSFListener mraListener;
            FormatTrackingHSSFListener fmtListener;
            EventWorkbookBuilder.SheetRecordCollectingListener recListener;
            mraListener = new MissingRecordAwareHSSFListener(this);
            fmtListener = new FormatTrackingHSSFListener(mraListener);
            recListener = new EventWorkbookBuilder.SheetRecordCollectingListener(fmtListener);
            request.AddListenerForAllRecords(recListener);

            POIFSFileSystem poifs = new POIFSFileSystem(this.fs);

            factory.ProcessWorkbookEvents(request, poifs);
        }

        public void ProcessRecord(Record record) {
            processedRecords.Add(record);
        }
    }
    class Program {
        static void Main(string[] args) {
            Stream fs = File.OpenRead(@"c:\users\me\desktop\xx\Book1-protected.xls");

            myListener testListener = new myListener(fs); // Use EventModel 
            //HSSFWorkbook book = new HSSFWorkbook(fs); // Use UserModel

            Console.Read();
        }
    }
}

UPDATE (for Juan Mellado): Below is the exception. My best guess right now (in the answer by Victor Petrykin) is that the HSSFEventFactory uses RecordInputStream which cannot natively decrypt protected records. Upon receiving the exception, processedRecords contains 22 records including the following potentially significant ones:

  • processedRecords[5] is a WriteAccessRecord with a garbled (probably encrypted) value for .name
  • processedRecords[22] is a RefreshAllRecord and is the last Record in the list

Exception:

NPOI.Util.RecordFormatException was unhandled
  HResult=-2146233088
  Message=Unable to construct record instance
  Source=NPOI
  StackTrace:
       at NPOI.HSSF.Record.RecordFactory.ReflectionConstructorRecordCreator.Create(RecordInputStream in1)
       at NPOI.HSSF.Record.RecordFactory.CreateSingleRecord(RecordInputStream in1)
       at NPOI.HSSF.Record.RecordFactory.CreateRecord(RecordInputStream in1)
       at NPOI.HSSF.EventUserModel.HSSFRecordStream.GetNextRecord()
       at NPOI.HSSF.EventUserModel.HSSFRecordStream.NextRecord()
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.GenericProcessEvents(HSSFRequest req, RecordInputStream in1)
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessEvents(HSSFRequest req, Stream in1)
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessWorkbookEvents(HSSFRequest req, POIFSFileSystem fs)
       at NPOI_small.myListener..ctor(Stream fs) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 35
       at NPOI_small.Program.Main(String[] args) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 80
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: NPOI.Util.RecordFormatException
       HResult=-2146233088
       Message=Expected to find a ContinueRecord in order to read remaining 137 of 144 chars
       Source=NPOI
       StackTrace:
            at NPOI.HSSF.Record.RecordInputStream.ReadStringCommon(Int32 requestedLength, Boolean pIsCompressedEncoding)
            at NPOI.HSSF.Record.RecordInputStream.ReadUnicodeLEString(Int32 requestedLength)
            at NPOI.HSSF.Record.FontRecord..ctor(RecordInputStream in1)
Steven
  • 13,501
  • 27
  • 102
  • 146
  • I have tested a Java equivalent version of your code, and it works as expected. I mean, it's not necessary to specify any password to read protected cells. The `processRecord` callback method receive all the `CellRecord` objects, such as `NumberRecord` or `FormulaRecord`, so I could detect them and access their attributes (Ex: `((NumberRecord) record).getValue()`). A `ProtectRecord` object just has an flag attribute to say if the current record is protected. Can you add some more information about what means "how to properly handle it"? And, of course, the stack trace of the exception you got. – Juan Mellado Apr 09 '13 at 10:42
  • @JuanMellado: See main post update for details. – Steven Apr 09 '13 at 15:57

1 Answers1

4

I think it's the bug in the NPOI library code. As far as I understood they use incorrect stream type for HSSFEventFactory: it uses RecordInputStream instead of RecordFactoryInputStream with decryption function like in the original POI library or in the UserModel (that's why HSSFWorkbook is working)

This code is working too but it's not a event logic:

POIFSFileSystem poifs = new POIFSFileSystem(fs);
Entry document = poifs.Root.GetEntry("Workbook");
DocumentInputStream docStream = new DocumentInputStream((DocumentEntry)document);
//RecordFactory factory = new RecordFactory();
//List<Record> records = RecordFactory.CreateRecords(docStream);
RecordFactoryInputStream recFacStream = new RecordFactoryInputStream(docStream, true);
Record currRecord;
while ((currRecord = recFacStream.NextRecord()) != null) 
   ProcessRecord(currRecord);
Steven
  • 13,501
  • 27
  • 102
  • 146
Victor Petrykin
  • 719
  • 3
  • 7
  • This is promising! I now see all events for the `Workbook`. How do I get records within a `Worksheet`, specifically `NumberRecord` (50), `LabelSSTRecord` ("fifty"), and `StringRecord` (=25*2)? – Steven Apr 09 '13 at 14:10
  • Finally!! A `RecordFactory.CreateRecords(Stream in1)` ignores each encrypted `Record` rather than throwing an error. As you stated, a `RecordFactoryInputStream` does include a built-in decryption function. See updated code in your answer. Thank you! – Steven Apr 10 '13 at 00:08