Single pass EDI parsing without an XML schema - Possible?

Question

Before you sigh and hold your head in your hands, please understand that I'm working with a pretty old system on a rather tight timeline.

We have a single pass EDI parser written in a business language. Currently, the data definitions including the loop level, area, and name of each segment are stored in a database table. This table also assigns each segment within an area an incremental sequence number. E.g., the 004010 810 Header area:

Segment Sequence
BIG 5
NTE 10
CUR 15
REF 20
YNQ 25
PER 30
N1 (start of loop) 35
N2 40

etc. etc.

So, if you read the segments in the order that they appear in the standard, we can say that each one can be assigned a sequential number, a "depth" (how many loops "down" it appears) and a name (2-3 characters).

The algorithm followed by the parser at present is as follows:

Reset currentArea to 1
For each segment in the document
{
   Search for the segment's name in the table restricting the area >= currentArea.
   If not found, we have an error.
   else
   {
      If the area changed
      {
         empty the temporary "search bounds" table.  Create a single record with upper bound equal to MAX(sequence in current area) and lower bound equal to MIN(sequence in current area).
      }
      If the area did not change
      {
          Search for the next segment with a matching, but only within the bounds of the last "bounds" record created.
          If the segment is found and the loop level changed as a result
          {
             Create a new bounds record with lower bound = MIN(sequence in current loop) and upper bound = MAX(sequence in current loop).
          }
          If the segment is not found within the searched bounds
          {
             "Pop" a bounds record out of the table to widen the search, repeat recursively until a segment having the same name is found.
          }
      }
   }
}

Unfortunately, I'm not sure that I have the time or the means to implement an XML based solution using an actual document schema. I am currently researching several such parsers, and they seem to be able to magically arrange EDI according to the schema, no matter how it looks.

The problem I'm facing is this:

In the 945 document, the Detail area looks like this (excerpt):

<DETAIL>
   <LX> 
   <MAN>
   <PAL>
   <N9>
      <W12 (loop header)>
      <G69>
      ...
      <miscellaneous other segments>
      ...
      <LS>
         <LX (loop header)>
         ...
         <miscellaneous other segments in LX loop>
         ...
      <LE>
      ...
</DETAIL>

In my raw data, we have:

LX*1~ MAN*GM*0000803225000421444452~ N9*2I*12150-1~ W12*CC*2*2*0*EA*101199007289*VN*10007~ N9*LI*1~ LX*5~ MAN*GM*0000803225000421444453~ N9*2I*12150-2~ ... (other segments)

Based on the algorithm above, when the second LX segment is hit, there is currently a "loop bounding" record from the first segment in the W12 (W12) to the last possible segment within the W12 (FA.FA2). Thus, when performing the search on the document's standard table, the next LX to be found in the definition is the LX that opens up its' own loop within the W12. This is wrong - The detail area is actually resetting here, and the LX is actually the first segment in the area, not the start of the W12.LX loop. Due to the naive nature of the parser, it cannot distinguish this since it is a bottom up search on the standards table based on loops.

Changing the parser to look at the start of the area (top down) rather than the current model creates the opposite problem. If the trading partner actually intended to open the inner W12.LX loop, the parser would interpret it as the start of a new detail area.

Is solving this case possible with a single pass parser that's using the standards as defined in the table I've described? Is finding some way to hack an XML solution into our rather old system the only approach here? Since EDI does not have "end tags", the only way I can be sure that a loop is actually over is by "looking ahead" in the document for scenarios that would be impossible, like a MAN segment appearing after the inner W12.LX (since the detail area MUST reset for the MAN segment to be used again).

I'm at the end of my rope, and any ideas would be welcome.

Lots of questions here. EDI has a few "end tags" - the segment terminator, the SE, GE, and IEA segments all act as ends as well. Seems like your focus is on "single pass" which I'm assuming takes the delimiters into account. What is your source? Why are we going from the 810 to the 945? Would an EDI to EDI "map" solve your problem? This is why I recommend commercial translators. You can define your source and your target and then do all kinds of stuff in the middle. What about a preprocess that reads the file (single pass) and counts all the segments / loops? — Andrew, Apr 01 '15 at 15:30
Is the parser written in Progress 4GL? In that case: post some code, there some of us here that might be able to look into it then. — Jensd, Apr 02 '15 at 05:54
@Andrew the 810 was an example of the assignment of sequence numbers. We already have a mapper where data can be moved to tables from segments for inbound documents by drawing lines. Under the hood the segments are represented by the same sequence numbers, so we parse our document to determine what sequence numbers we have so that the map can run correctly. — j a s t i u m, Apr 02 '15 at 14:39
@JenSD I can try to get some source code for the section of the parser I was struggling with uploaded when I have some time. Thanks! — j a s t i u m, Apr 02 '15 at 14:40

eppye · Accepted Answer · 2015-04-02T18:48:25.110

0

Yes, this can be done in single pass parser (I did that).
As you indicate, correct loop level should be indicated in table.
The tric is to you keep track of where you are in the table.
Read a new segment, look that up in the table.
Start that lookup from last segment in table that was read.
The lookup in the table is somewhat complicated:
1. the new record can be in same loop-level
2. can be repeat of same loop
3. loop might have ended and you need to look further in the message
If not found, give error.
If found, go to to new segment in incoming file etc.

AFAIK it is absolutely necessary to have in table if segments/loops are mandatory or conditional - if you want a general tool that can parse all message/transaction types.

Actually the problem you run into is the LS/LE loop. The 2nd LX loop is embedded in LS/LE segments. LS/LE is invented to solve that 'collision' problem. If in LS loop, it should be terminated with LE segment.

edited Apr 02 '15 at 18:48

answered Apr 02 '15 at 16:52

eppye

710
4
9

Thanks for the detailed answer. Regarding the LX, are you saying that the LS/LE is required for the 2nd LX loop, so if our parser actually took into account mandatory/conditional, we may have been able to avoid it? As I described, my parser would only search within the segments of the current loop, starting from the first segment in the loop. Then if it was not found it would widen the search to the entire previous loop. I believe if what you are saying is correct, we would indeed find the LX within the current loop, but due to its conditional nature, invalidate it. That about right? – j a s t i u m Apr 03 '15 at 18:15
AFAIK most accurate is to say that if you have entered the LS/LE loop, the LE is needed. The same thing can be reached by making LS a header segment, and let the LE segment be that last segment of LS loop and it is mandatory. – eppye Apr 04 '15 at 16:19
hi Jastium, it is hard for me to judge if your implementation is correct. SO I tried to indicate to you want is actually needed in the algorithm. I used this myself for bots open source edi translator; that is a general tool for all edifact messages/x12 transactions, I know this works. Think there are not that many solutions for the alogrithm; the information in the table is quite limited. The LS/LE loop is a special case. Maybe this link is useful for more information: https://www.mail-archive.com/edi-l@yahoogroups.com/msg07336.html – eppye Apr 04 '15 at 16:28

Single pass EDI parsing without an XML schema - Possible?

1 Answers1