Reading a Cobol generated file

Question

I’m currently on the task of writing a c# application, which is going sit between two existing apps. All I know about the second application is that it processes files generated by the first one. The first application is written in Cobol.

Steps: 1) Cobol application, writes some files and copies to a directory. 2) The second application picks these files up and processes them.

My C# app would sit between 1) an 2). It would have to pick up the file generated by 1), read it, modify it and save it, so that application 2) wouldn’t know I have even been there.

I have a few problems.

First of all if I open a file generated by 1) in notepad, most of it is unreadable while other parts are.
If I read the file, modify it and save, I must save the file with the same notation used by the cobol application, so that app 2), doesn´t know I´ve been there.

I´ve tried reading the file this way, but it´s still unreadable:

Code:

        string ss = @"filename";

        using (FileStream fs = new FileStream(ss, FileMode.Open))
        {
            StreamReader sr = new StreamReader(fs);
            string gg = sr.ReadToEnd();
        }

Also if I find a way of making it readable (using some sort of encoding technique), I´m afraid that when I save the file again, I may change it´s original format.

Any thoughts? Suggestions?

You need to find out the vendor of the COBOL, then find out what the file format is. There's no single "COBOL" format. — John Saunders, Feb 02 '11 at 17:02
It would be interesting to know what format is *supposed* to be, is it a CSV ? Can you talk with the guy who generate the file to ask him what the file should contain ? — Felice Pollano, Feb 02 '11 at 17:03
Hi, thanks for the replies. I´ll find out the Cobol version and the file type.(files are generated without an extension) I know that the platform is windows and I guess it´s not a CSV file.(comma seperated file). — Rauland, Feb 02 '11 at 17:29
@rauland: not only the version of COBOL, but the vendor. Microfocus is one vendor, but I believe there is more than one (Fujitsu?) — John Saunders, Feb 02 '11 at 18:48
@Everbody thanks for the interest and help so far. This is starting to look like a very difficult task. This is my first post on stackoverflow and am overwhelmed by the answers i´ve received so far, some way way out of my league (like Nicholas Careys´s answer! dear what an answer.. lets see if I understand everything). — Rauland, Feb 02 '11 at 19:48
@rauland You will find if you post good questions, you will generally get good answers on SO. Have a go at the problem and try and break your issues down into small chunks - don't try and eat the whole cow at once. Stick with it and good luck. :) — Tim Lloyd, Feb 02 '11 at 19:51
I’m not sure whether this can help. But the knowledge I have of the system which generate the files is: 1)An operator starts of by creating a batch file. A header is added to the file which details a few things like batch type, date, and I believe record format. 2)The operator continues to work on the batch and records are added to the file. (I understand that each record is no more than I line of text.)3)The operator then decides to complete the batch which adds footer to the end of the file, with a few details like the number of records added to the batch. — Rauland, Feb 02 '11 at 20:03
..If this is the process can we tell what type of file is it likely to be? — Rauland, Feb 02 '11 at 20:04
The idea now is that my c# app would be responsible of completing step 3), add the footer, close the batch and pass the file to the next system. I thought it would be as easy as opening a few closed batch files myself to see the footer contents and what information it had. Then from C# generate this information myself and append it to the file. It seems to be nowhere near as straight forward as I thought it would be. — Rauland, Feb 02 '11 at 20:26
@chibacity I´ve arranged to get a few sample files I can upload. I´ll get around to doing this latter on this evening from home as I am behind a proxy and can´t access my skydrive... would be great if somebody could have a look at the them, if this is okay? I´ve been looking into all the great comments I´ve received so far, most of them I find very complicated, but doing my best to tackle them. Any advice on how to get up to speed (with the basics!?) ? I´ve tried to mark all answers as useful, but not allowed as I´m a stackoverflow newbie. :) Cheers! — Rauland, Feb 03 '11 at 15:34
Please don't post anything unless you and your employer are comfortable having it in the public domain for everyone/anyone to see. Do not post anything that might identify an individual, a credit card number, bank account, social insurance number or any other identifying information. It **will** come back to haunt you! — NealB, Feb 03 '11 at 19:51
Hi thanks for the advice. Both files have dummy data and I´ve also asked for permission to post the files, so I can´t see no harm in posting them, and all the best if somebody can help out and identify what file types they are. The files have a header called: CDL, records in this case "etiquetas" type and its missing the trailer section. (which is what I´m suppose to complete from c#.) At the momment I'd need a way to count how many records are in the file, so must identify some sort of separator in the file. Thanks a lot! — Rauland, Feb 03 '11 at 20:10
Heres [File 1](http://www.4shared.com/file/iQ5me91j/test1.html). and here: [File 2](http://www.4shared.com/file/H3DvjgiB/test2.html) — Rauland, Feb 03 '11 at 20:11
Can you modify the Cobol program so it creates data that is just display data? And no implied decimal points in the data. Also, "record format" may be a clue. You may have more than 1 type of record being written to the file. That is bad practice. — Cathy Sullivan, Feb 05 '11 at 18:13
I looked at the files you provided. I even looked at it in binary (option in TextPad). You need the file layout from the original program no matter what language you were using. Don't you have anyone who can modify the Cobol program? That would be the better way to do it. — Cathy Sullivan, Feb 05 '11 at 18:26
@Cathy Sullivan Hi Cathy thanks a lot for your time. I’m not sure whether I’m breaking some stackoverflow rule here with such a long answer so please let me know if so. Also I’ve had to spread it across many comments as it wouldn’t fit in one. There is an existing COBOL program which adds the trailer or file closure but currently resides on a UNIX machine. I believe they want to replace this COBOL program running on UNIX, with a win forms app running on Windows Server. I’ll explain what I’ve done so far. Without understanding much I started inspecting files with an HEX editor. — Rauland, Feb 05 '11 at 22:01
(Just to recap the file structure starts with a header, then data records and then trailer.) From inspecting the file with the editor, I believe that each COBOL file is structured in a way that for each record there is a structure or type definition and then the actual data. This seems to happen for each record in the file. So going back to C# if I need to count these data records, I would need to: - Open the file. - Detect where the header begins and ends. — Rauland, Feb 05 '11 at 22:02
- Where the header ends, is where the data records begin, so from here onwards detect a separator. - Find out how many separators there are, and count the data records. - Write out the trailer. By the way I haven’t seen a file with a trailer yet! (I know it’s mad). All I’ve been told about the trailer is I need to write out the number of records. — Rauland, Feb 05 '11 at 22:03
So in my mind and being really positive I start to go on a little mission: - From C# I’d load the HEX representation of the file in a string. - The header structure definition seems to be the same for each file. So I could look for this structure definition and detect where the header starts and ends. - Where it ends is where the data records begin. Each data record would begin with a structure definition, I could some how extract this definition, and this could be my record separator. — Rauland, Feb 05 '11 at 22:05
- I’d look out for how many times this separator is repeated and count the records. - If the closure or trailer is simple enough I could right out in HEX the count and append to the file. Then I start to come down to earth and believe and realise that this may be way too complicated and an unreliable way to do this. — Rauland, Feb 05 '11 at 22:06
My project team manager never realised that this would be such a problem, he thought these COBOL files would be like normal text files. So here’s my question, could COBOL produce, instead of these difficult to process files, just normal .txt files? Anyway, I’m still trying to explain to him that processing these files from C# isn’t at all straight forward. And probably it could take me a long time to figure out how to process them if I ever do. — Rauland, Feb 05 '11 at 22:06
@rauland, What needs to be in the trailer record? Just a count of the number of detail records? Do you know what the format should be? There should be a CRLF at the end of each record that the Cobol program wrote. Do you know how long each record is supposed to be? They are probably all the same size. — Cathy Sullivan, Feb 06 '11 at 00:16
@Cathy Sullivan, I’m not getting much help my end. All I’ve been told about the trailer is that I need to count the records and add this count to the trailer. Also I found out that the COBOL program which is currently adding the trailers makes you choose a specific length. The options you have are: 80, 160, 240, and 360. My understanding is that if you choose for instance 80, it will trim all records to that length. The format varies, depends on the batch type. Each batch type would have a different format. — Rauland, Feb 06 '11 at 09:28
@Everybody As I am not getting much help (from my side :) ), I’ve started to consider creating simpler versions of these COBOL files and then trying to access them from C#. This way I can start to get familiar with these files. I managed to install a “Microsoft COBOL Compiler Version 2.20 Microsoft Corp. 1982-87”. I’ve started of with a few hello worlds but now want to create similar COBOL files. I’ve looked and can’t find any tutorials or guides which could help me out. — Rauland, Feb 06 '11 at 09:46
It’s difficult because I don’t really know what file types I’m looking for. (I’ve already asked for the vendor, file type and COBOL version). Once I get simpler versions of these files going I can then start of accessing them from C# and work directly with the HEX representation. I know that probably the main application is a different COBOL vendor, probably IBM, but IBM COBOL compiler isn’t free. — Rauland, Feb 06 '11 at 09:47
See some more information [here][1] [1]: http://stackoverflow.com/questions/5109302/how-to-read-ebcdic-data-with-a-non-standard-codepage-and-not-mess-up-numbers — GilShalit, Jul 17 '11 at 19:09

score 29 · Accepted Answer · answered Feb 02 '11 at 18:44

To read the COBOL-genned file, you'll need to know:

First, you'll need the record layout (copybook) for the file. A COBOL record layout will look something like this:

01  PATIENT-TREATMENTS.
    05  PATIENT-NAME                PIC X(30).
    05  PATIENT-SS-NUMBER           PIC 9(9).
    05  NUMBER-OF-TREATMENTS        PIC 99 COMP-3.
    05  TREATMENT-HISTORY OCCURS 0 TO 50 TIMES
           DEPENDING ON NUMBER-OF-TREATMENTS
           INDEXED BY TREATMENT-POINTER.
        10  TREATMENT-DATE.
            15  TREATMENT-DAY        PIC 99.
            15  TREATMENT-MONTH      PIC 99.
            15  TREATMENT-YEAR       PIC 9(4).
        10  TREATING-PHYSICIAN       PIC X(30).
        10  TREATMENT-CODE           PIC 99.

You'll also need a copy of IBM's Principles of Operation (S/360, S370, z/OS, doesn't really matter for our purposes). Latest is available from IBM at

http://www-01.ibm.com/support/docview.wss?uid=isg2b9de5f05a9d57819852571c500428f9a (but you'll need an IBM account.
An older edition is available, gratis, at http://www.hack.org/mc/texts/principles-of-operation.pdf

Chapters 8 (Decimal Instructions) and 9 (Floating Point Overview and Support Instructions) are the interesting bits for our purposes.

Without that, you're pretty much lost.

Then, you need to understand COBOL data types. For instance:

PIC defines an alphameric formatted field (PIC 9(4), for example is 4 decimal digits, that might be filled with for space characters if missing). Pic 999V99 is 5 decimal digits, with an implied decimal point. So-on and so forthe.
BINARY is [usually] a signed fixed point binary integer. Usual sizes are halfword (2 octets) and fullword (4 octets).
COMP-1 is single precision floating point.
COMP-2 is double precision floating point.

If the datasource is an IBM mainframe, COMP-1 and COMP-2 likely won't be IEE floating point: it will be IBM's base-16 excess 64 floating point format. You'll need something like the S/370 Principles of Operation to help you understand it.

COMP-3 is 'packed decimal', of varying lengths. Packed decimal is a compact way of representing a decimal number. The declaration will look something like this: PIC S9999V99 COMP-3. This says that is it signed, consists of 6 decimal digits with an implied decimal point. Packed decimal represents each decimal digit as a nibble of an octet (hex values 0-9). The high-order digit is the upper nibble of the leftmost octet. The low nibble of the rightmost octet is a hex value A-F representing the sign. So the above PIC clause will require ceil( (6+1)/2 ) or 4 octets. the value -345.67, as represented by the above PIC clause will look like 0x0034567D. The actual sign value may vary (the default is C/positive, D/negative, but A, C, E and F are treated as positive, while only B and D are treated as negative). Again, see the S\370 Principles of Operation for details on the representation.

Related to COMP-3 is zoned decimal. This might be declared as `PIC S9999V99' (signed, 5 decimal digits, with an implied decimal point). Decimal digits, in EBCDIC, are the hex values 0xFO - 0xF9. 'Unpack' (mainframe machine instruction) takes a packed decimal field and turns in into a character field. The process is:

start with the rightmost octet. Invert it, so the sign nibble is on top and place it into the rightmost octet of the destination field.
Working from right to left (source and the target both), strip off each remaining nibble of the packed decimal field, and place it into the low nibble of the next available octet in the destination. Fill the high nibble with a hex F.
The operation ends when either the source or destination field is exhausted.
If the destination field is not exhausted, if it left-padded with zeroes by filling the remaining octets with decimal '0' (oxF0).

So our example value, -345.67, if stored with the default sign value (hex D), would get unpacked as 0xF0F0F0F3F4F5F6D7 ('0003456P', in EBDIC).

[There you go. There's a quiz later]

If the COBOL app lives on an IBM mainframe, has the file been converted from its native EBCDIC to ASCII? If not, you'll have to do the mapping your self (Hint: its not necessarily as straightforward as that might seem, since this might be a selective process -- only character fields get converted (COMP-1, COMP-2, COMP-3 and BINARY get excluded since they are a sequence of binary octets). Worse, there are multiple flavors of EBCDIC representations, due to the varying national implementations and varying print chains in use on different printers.

Oh...one last thing. The mainframe hardware tends to like different things aligned on halfword, word or doubleword boundaries, so the record layout may not map directly to the octets in the file as there may be padding octets inserted between fields to maintain the needed word alignment.

Good Luck.

I will come back tomorrow and upvote - have run out for today. Excellent answer. Looks like quite a formidable task! :) — Tim Lloyd, Feb 02 '11 at 18:48
It's generally not as bad as it seems. Most COBOL apps write out straight character format records. You seldom see floating point out in the wild, but you might see packed decimal or fixed point binary. Fixed point binary is a straight 1:1 mapping to `short` or `int` (outside of big-/little-endian issues). Packed decimal is bit of a hassle, but it's not that bad to write a conversion routine to convert to `decimal`. — Nicholas Carey, Feb 02 '11 at 18:51
not my copybook `B^)`. I've been out of the mainframe world for a long, long time, though recently, I did have to go through the process of building a process to periodically import a dump of a COBOL file containing a mixture of text, binary and packed decimal data and bring it into the .Net/SQL Server world. The conversion from EBCDIC is [mostly] pretty easy as the .Net TextInfo supports several EBCDIC code pages (037 and 500, for starters). Had to write a packed-decimal-to-`decimal` conversion routine and convert the fixed point from big-endian to little-endian. Easy! — Nicholas Carey, Feb 03 '11 at 19:49
Unless you specify SYNCHRONIZED, the Mainframe does nothing for alignment. — Bill Woodger, Jan 26 '13 at 00:44

Bruce Martin · Answer 2 · 2011-02-03T06:59:13.360

It would be useful to know which Cobol Dialect you are dealing with because there is no single Cobol Format. Some Cobol Compilers (Micro Focus) put a "File Description" at the front of files (For Micro Focus VB / Indexed files).
Have a look at the RecordEditor (http://record-editor.sourceforge.net/). It has a File Wizard which could be very useful for you.
- In the File Wizard set the file as Fixed-Width File (most common in Cobol). The program lets you try out different Record Lengths. When you get the correct record length, the Text fields should line up.
- Latter on in the Wizard there is field search which can look for Binary, Comp-3, Text Fields.
- There is some notes on using the RecordEditor's Wizard with an unknown file here http://record-editor.sourceforge.net/Unkown.htm
Unless the file is coming from a Mainframe / AS400 it is unlikely to use EBCDIC (cp037 - Coded Page 37 is US EBCDIC), any text is most likely in Ascii.
The file probably contains Packed-Decimal (Comp3) and Binary-Integer data. Most Cobols use Big-Endian (for Comp integers) even on Intel (little endian hardware).
One thing to remember with Cobol PIC s9(6)V99 comp is stored as a Binary Integer with x'0001' representing 0.01. So unless you have the Cobol definition you can not tell wether a binary 1 is 1 0.1, 0.01 etc

score 2 · Answer 3 · answered Feb 03 '11 at 04:01

I see from comments attached to your question that you are dealing with the “classic” COBOL batch file structure: Header record, detail records and trailer record.

This is probably bad news if you are responsible for creating the trailer record! The typical “trailer” record is used to identify the end-of-file and provides control information such as the number of records that precede it and various check sums and/or grand totals for “detail” records. In other words, you may need to read and summarize the entire file in order to create the trailer. Add to this the possibility that much of the data in the file is in Packed Decimal, Zoned Decimal or other COBOLish numeric data types, you could be in for a rough time.

You might want to question why you are adding trailer records to these files. Typically the “trailer” is produced by the same program or application that created the “detail” records. The trailer is supposed to act as a verification that the sending application/program wrote all of the data it was supposed to. The summary totals, counts etc. are used by the receiving application to verify that the detail records tally with the preceding details. This is supposed to serve as another verification that the sending application didn't muff up the data or that it was not corrupted en-route (no that wasn't a joke – but maybe it should be). When a "man in the middle" creates the trailers it kind of defeats the entire purpose of the exercise (no matter how flawed it might have been to begin with).

+1 for remind me how bad things became when Total fields overflowed in the trailer :) — Dr. belisarius, Feb 03 '11 at 12:35

Reading a Cobol generated file

3 Answers3

Linked