0

I'm working on an EDI file parser, and I'm having considerable difficulty implementing an escape for the 'segment terminator'. For anyone fortunate enough to not work with EDI, the segment terminator (usually an apostrophe) is the deliter between segments, which are like cells.

The desired behaviour looks something like this:

ABC+123'DEF+567'  -> ["ABC+123", "DEF+567"]
ABC+123?'DEF+567' -> ["ABC+123?'DEF+567"]

Using FParsec, without escaping the apostrophe (and, for simplicity, ignoring parameterisation), the parser looks something like this:

let pSegment = //logic to parse the contents of a segment
let pAllSegments = sepEndBy pSegment (str "'")

This approach with the above example would yield ["ABC+123?", "DEF+567"].

My next consideration was to use a regex:

let pAllSegments = sepEndBy pSegment (regex @"[^\?]'")

The problem here is that the character prior to the apostrophe is also consumed, leading to incomplete messages.

I'm fairly certain I just don't understand FParsec well enough here. Does anyone have any pointers?

ddek
  • 111
  • 1
  • 4
  • 1
    Looks to me that you need to put this into you "logic to parse contents", as it's a part of the contents, not the part of a separator. Contents should be a choice of char or escaped char. – Abel Jun 08 '20 at 09:10
  • 1
    Yes, that was it. I thought I was parsing the escaped terminator in the contents, but a stupid mistake had prevented that. Thanks for the tip. – ddek Jun 08 '20 at 09:34

1 Answers1

1

The issue is in the parse contents step.

The parser is working 'bottom up'. It finds the contents of the segments, which are not permitted to contain the terminator, then finds that all these segments are separated by the terminator, and constructs the list.

My error was in the pSegment step, which was using a parameterised version of (?:[A-Za-z0-9 \\.]|\?[\?\+:\?])*. See that second ?? That should have been a '.

ddek
  • 111
  • 1
  • 4
  • 1
    Great that you found it yourself. Indeed, recursive descent parsers like this one process data "bottom up" (deepest leaf first). – Abel Jun 08 '20 at 13:27