-1

I am working with ANSI 835 plain text files and am looking to capture all data in segments which start with “BPR” and end with “TRN” including those markers. A given file is a single line; within that line the segment can, but not always, repeats. I am running the process on multiple files at a time and ideally I would be able to record the file name in which the segment(s) occur. Here is what I have so far, based on an answer to another question:

#!/bin/sed -nf
/BPR.*TRN/ {
   s/.*\(BPR.*TRN\).*/\1/p
   d
 }
 /from/ {
     : next
     N
     /BPR/ {
        s/^[^\n]*\(BPR.*TRN\)[^n]*/\1/p
        d
      }
      $! b next
}

I run all files I have through this and write the results to a file which looks like this:

BPR*I*393.46*C*ACH*CCP*01*011900445*DA*0000009046*1066033492**01*071923909*DA*72
34692932*20150120~TRN
BPR*I*1611.07*C*ACH*CCP*01*031100209*DA*0000009108*1066033492**01*071923909*DA*7
234692932*20150122~TRN
BPR*I*1415.25*C*CHK************20150108~TRN
BPR*H*0*C*NON************20150113~TRN
BPR*I*127.13*C*CHK************20150114~TRN
BPR*I*22431.28*C*ACH*CCP*01*071000152*DA*99643*1361236610**01*071923909*DA*72346
92932*20150112~TRN
BPR*I*182.62*C*ACH*CCP*01*071000152*DA*99643*1361236610**01*071923909*DA*7234692
932*20150115~TRN

Ideally each line would be prepended with the file name like this:

IDI.Aetna.011415.64539531.rmt:BPR*I*393.46*C*ACH*CCP*01*011900445*DA*0000009046*1066033492**01*071923909*DA*72
34692932*20150120~TRN
IDI.BCBSIL.010915.6434438.rmt:BPR*I*1611.07*C*ACH*CCP*01*031100209*DA*0000009108*1066033492**01*071923909*DA*7
234692932*20150122~TRN
IDI.CIGNA.010215.64058847.rmt:BPR*I*1415.25*C*CHK************20150108~TRN
IDI.GLDRULE.011715.646719.rmt:BPR*H*0*C*NON************20150113~TRN
IDI.MCREIN.011915.6471442.rmt:BPR*I*127.13*C*CHK************20150114~TRN
IDI.UHC.011915.64714417.rmt:BPR*I*22431.28*C*ACH*CCP*01*071000152*DA*99643*1361236610**01*071923909*DA*72346
92932*20150112~TRN
IDI.UHC.011915.64714417.rmt:BPR*I*182.62*C*ACH*CCP*01*071000152*DA*99643*1361236610**01*071923909*DA*7234692
932*20150115~TRN

The last two lines would be an example of a file where the segment pattern repeats.

Again, prepending each line with the file name is ideal. What I really need is to be able to process a given single-line file which has the “BPR…TRN” segment repeating and write all segments in that file to my output file.

Community
  • 1
  • 1
rcfrank
  • 1
  • 1
  • 1
    Would you show some sample input? In particular, the the question states "A given file is a single line" yet your sample code goes to lengths to remove newline characters. Also the sample code looks for lines containing `from` yet your description makes no mention of why `from` is important. Some sample input might help clarify. – John1024 Jan 22 '15 at 00:59
  • 1
    sed is 100% the wrong tool for this job so throw that sed script away as most of the constructs it's using became obsolete in the mid-1907s when awk was invented and start again by posting some sample input and expected output. – Ed Morton Jan 22 '15 at 01:21
  • Can you use COBOL for this? I think that language is popular for this problem domain. – John Zwinck Jan 22 '15 at 01:58
  • John Zwinck: Unfortunately COBOL is a no. – rcfrank Jan 22 '15 at 14:36

1 Answers1

1

Try with awk:

awk '
    /BPR/ { sub(".*BPR","BPR") }
    /TRN/ { sub("TRN.*","TRN") }
    /BPR/,/TRN/ { print FILENAME ":" $0 }
' *.rmt
Danny Daglas
  • 1,501
  • 1
  • 9
  • 9
  • awk does prepend the file name. It still doesn't write any segments past the first. A given file is a single line with no CRLF. Additional comments above. – rcfrank Jan 22 '15 at 14:32