Using BizTalk Flat File Disassembler to split incoming file by larger than 1 record?

Question

I have an incoming flat file that I wish to receive and break into discrete chunks for more efficient processing. There is a nice sample post for BT2010 on getting the flat file disassembler to help with this here:

http://msdn.microsoft.com/en-us/library/aa560774(v=bts.70).aspx

However, near the bottom of the post you will see that they set the max occurs of the body record to 1 and neatly split the file into one message per record. However, I would like to split my file into chunks of 1000 records. However, when attempting to set the max occurs to 1000, the pipeline reads fine until the last chunk which is not an even 1000 records and then we get an unexpected end of stream error.

Is there a way to get the stock FF disassembler to play nice here, or do we need to write a custom disassembler? Or is there some other good way to get the chunking behavior we desire?

Thanks.

score 2 · Answer 1 · answered Feb 14 '12 at 13:50

The max occurs is used to debatch messages from the incoming message, not to determine how many records should be in the output message. So you will have to create a custom flat file disassembler component which reads the incoming file in a batched fashion: read some data from the stream (e.g. based on the number of lines) and pass it on.

There seems to be a problem with how the GetNext method reads the data in larger files, which could results in excessive memory usage (I had a scenario where this happened with a 10Mb file containing about 800 000 line items). So all one needs to do is re-implement the GetNext method to cater for your scenario of outputting a certain number of records per message and at the same time be more efficient in processing larger messages.

Here is part of the original GetNext (the important parts) methods decompiled code:

private IBaseMessage GetNext2(IPipelineContext pc)
    {
      ...
            baseMessage = this.CreateOutputMessage(pc);
      ...
          baseMessage = this.CreateOutputMessage(pc);
        ...
        return baseMessage;
    }

The "CreateOutputMessage" method ends up calling the "CreateNonrecoverableOutputMessage" method which is where the problem seems to lie when processing larger messages:

internal IBaseMessage CreateNonrecoverableOutputMessage(IPipelineContext pc)
{
  ...
  XmlReader reader1 = this.m_docspec.Parse(this.m_inputData);
  ...
  return message;
}

The "m_inputData" variable was created calling the "FFDasmComp.DataReaderFunction" delegate passed into the constructor of the flat file disassembler component. You might be able to control the reading of data by passing your own data reader method into the constructor of your custom implementation of the flat file disassembler component.

There are a couple of article out there, but the given implementations has some serious caveats when dealing with larger messages:

Debatching Large Messages and Extending Flatfile Pipeline Disassembler Component in Biztalk 2006

Processing 10 MB Flat File in BizTalk

Using BizTalk Flat File Disassembler to split incoming file by larger than 1 record?

1 Answers1