COBOL: Simple file reading issue

Question

I have a very basic COBOL program that reads a file input.dat and simply outputs it in the console. The input.dat file looks like this:

John                Johnson             
Peter               Peterson            
Juliette            Julietteson         
Natasha             Natashason          
Justin              Justinson

It isn't correctly displayed here but I'm positive there are 20 chars for the first name and 20 chars for the last name.

This is my COBOL program:

    IDENTIFICATION DIVISION.
    PROGRAM-ID. ATEST4.
    ENVIRONMENT DIVISION.
    INPUT-OUTPUT SECTION.
    FILE-CONTROL.
        SELECT INPUTFILE ASSIGN TO "files/input.dat".
    DATA DIVISION.
    FILE SECTION.
    FD  INPUTFILE LABEL RECORDS ARE OMITTED.
    01  INPUTRECORD              PIC X(40).    
    WORKING-STORAGE SECTION.
    01  FILE-STATUS              PIC 9 VALUE 0.
    PROCEDURE DIVISION.
    001-MAIN.
        OPEN INPUT INPUTFILE.
        PERFORM 002-READ UNTIL FILE-STATUS = 1.
        CLOSE INPUTFILE.
        STOP RUN.
            
    002-READ.
        READ INPUTFILE
            AT END MOVE 1 TO FILE-STATUS
            NOT AT END DISPLAY INPUTRECORD
        END-READ.

Instead, the output looks like this:

John                Johnson             
Peter               Peterson            
Juliette            Julietteson         
Natasha             Natashason          
Justin              Justinson           
ustin              Justinson

The last line seems to be a copy of the previous one, with a missing first character and several less trailing spaces (it totals to 35 chars).

Why does this happen? It seems like a misconception of the AT END clause, but I can't get around it.

Edit: updated compiler as suggested. Still the same result. Here is a link for my input file, if it helps

I'm not familiar with opencobol, but the cobol WRITE FROM that I remember grabbed the FROM from Working Storage, not the input file description. — Gilbert Le Blanc, Oct 14 '15 at 18:20
@GilbertLeBlanc I used that approach too, but the output is the same — Alvaro, Oct 14 '15 at 18:22
@GilbertLeBlanc I also removed the write part, outputting to the console yields the same result.. maybe I need some EOF marker on the file?? — Alvaro, Oct 14 '15 at 18:34
If you actually have OpenCOBOL, you should upgrade to GnuCOBOL (the new name for OpenCOBOL). OpenCOBOL is no longer maintained as OpenCOBOL so you have an old compiler. If you are getting that result, you're not getting that from the code you have shown (latest edit, Revision 5) because that will not even compile. — Bill Woodger, Oct 14 '15 at 19:04
@BillWoodger I have cobc (OpenCOBOL) 1.1.0 Copyright (C) 2001-2009 Keisuke Nishida / Roger While. Care to elaborate on why it won't compile? Updating now. — Alvaro, Oct 14 '15 at 19:07
@BillWoodger I see there was an identifier that wasn't translated. I highly doubt an updated compiler will change this kind of behaviour (mine is from 2013) — Alvaro, Oct 14 '15 at 19:11
Please check your input data. The last line appears to be only 35 characters including trailing blanks. Using GnuCOBOL 2.0.0 (not released yet) your input data as pasted also gives odd output. Making the last input record 40 characters produces the expected output. You should check your file, and download at least 1.1 (current release) or 2.0.0 (basis of next release). — Bill Woodger, Oct 14 '15 at 19:16
If you correct your data (the 35 is too much of a coincidence) I'd agree it should work with OpenCOBOL 1.1.0. However, why continue with OpenCOBOL when GnuCOBOL 1.1 is the current release. It is, of course, up to you. — Bill Woodger, Oct 14 '15 at 19:20
@BillWoodger updating :) thanks for all the help! I'll have to check the input file, then. It's probably not correctly created — Alvaro, Oct 14 '15 at 19:21
@BillWoodger If you want, rephrase the comments as an answer and I'll accept it — Alvaro, Oct 14 '15 at 19:21
Ach. Now noticed you've got your actual file downloadable. Back to square one, as I can reproduce the output. — Bill Woodger, Oct 15 '15 at 00:27
Update again. With your actual data, rather than what I pasted from SO, your records are all too long, rather than one record too short (I may have pasted the output data, not the input), which gives all but the first record a new-line *before* the name. And some other data, which was "hidden" previously because it was blank (trailing part of name) and not enough records. I've got test data to ensure that the final bytes on each record have data. Compare the DISPLAY output when you run it to the text-view of the output file. — Bill Woodger, Oct 15 '15 at 01:15

Bill Woodger · Accepted Answer · 2015-10-15T01:12:44.767

OK, missed a trick. Or two. You are using fixed-length records of 40 bytes. When you use fixed-length records, unlike line-sequential, there is no single-trailing-null-stripping on READ and null-appending on WRITE.

I also pasted your data from the question, and it arrived to me as 40-byte records including the null record-delimiter.

Now that I have your real data...

Instead of five records of 40 bytes, you have five of 41. If you consider that a a lump of data, which the COBOL program is going to read 40 bytes at a time, that gives you five records of 40 bytes, and one of five.

Without nulls being appended to the records, I should see all the output data as one long line. But I don't. Why?

There are this time, with the "long" records, there are leading record-delimiters of null, on all but the first record.

Here's some data for you to test with:

John                Johnson   0123456789
Peter               Peterson  0123456789
Juliette            Julietteson123456789
Natasha             Natashason0123456789
Justin              Justinson 0123456789
1234511111111111111111111111111111111110

That is intended to be 40 bytes of data per record, followed by a null record-terminator.

Here's your modified program to compile and run the data through. Rather than fixing up the columns after pasting from your question (pasting from SO is not great for COBOL) I used `cobc -x -free prog.cob. And since it was busted, didn't pay much attention to where I jammed the new stuff in.

The point of the ">" and "<" in the DISPLAYs is to bound the field. Then you can identify the position of the nulls, as they cause a break.

IDENTIFICATION DIVISION.
PROGRAM-ID. FILE-TEST.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT INPUTFILE ASSIGN TO "files/input.dat"
file status is fs1.
SELECT OUTPUTFILE ASSIGN TO "files/output.dat"
file status is fs2.
DATA DIVISION.
FILE SECTION.
FD INPUTFILE
    LABEL RECORDS ARE OMITTED
    record is varying depending on record-length.
01 INPUTRECORD PIC X(40).
FD OUTPUTFILE
    LABEL RECORDS ARE OMITTED.
01 OUTPUTRECORD PIC X(40).
WORKING-STORAGE SECTION.
    01 EOF PIC 9 VALUE 0.
    01  fs1 pic xx.
        88  fs1-ok value zero.
        88  fs1-eof value "10".
    01  fs2 pic xx.
        88  fs2-ok value zero.
    01 CUSTOMER.
        02 FIRST-NAME PIC X(20).
        02 LAST-NAME PIC X(20).
    01  rec-count comp-3 pic 999 value zero.
PROCEDURE DIVISION.
    001-MAIN.
        OPEN INPUT INPUTFILE OUTPUT OUTPUTFILE.
        if not fs1-ok
            display "bad fs1 O>" fs1 "<"
        end-if
        if not fs2-ok
           display "bad fs2 O>" fs2 "<"
        end-if
        PERFORM 002-READWRITELOOP UNTIL EOF = 1.
        CLOSE INPUTFILE. 
       if not fs1-ok
           display "bad fs1 C>" fs1 "<"
       end-if
       CLOSE OUTPUTFILE.
       if not fs2-ok
           display "bad fs2 C>" fs2 "<"
       end-if
       STOP RUN.

   002-READWRITELOOP.
       READ INPUTFILE INTO CUSTOMER
           AT END MOVE 1 TO EOF
              display "at end"
       if not fs1-ok
           display "bad fs1 R>" fs1 "< " rec-count
       end-if
           NOT AT END WRITE OUTPUTRECORD FROM CUSTOMER
      DISPLAY ">" CUSTOMER "<"
      DISPLAY ">" inputrecord "<"
      add 1 to rec-count
       display rec-count
       if not fs1-ok
           display "bad fs1 R>" fs1 "< " rec-count
       end-if
       if not fs2-ok
           display "bad fs2 W>" fs2 "<"
       end-if
       END-READ
      .

I will try to understand why the last "line" appears like it does, as in your original, and take it up in the GnuCOBOL discussion area.

The fix: either use LINE SEQUENTIAL on the SELECT and leave your data as it is; or, remove all the null/new-lines from your data. Either of these will give your 40 bytes of correctly lined-up data for the (now) six records in the input.

OK, you're going to like this.

I changed your original program by adding the FILE STATUS clause to both the SELECT statements.

I tested the file-status fields I defined after each IO (OPEN, CLOSE, READ and WRITE).

The OPENs and CLOSEs give a file-status of "00". Expected.

The first four READs give a file-status of "00". Expected.

The five WRITEs give a file-status of "00". Expected.

The fifth READ gives a file-status of "04". Which means:

A READ statement was successfully executed, but the length of the record being processed did not conform to the fixed file attributes for that file.

So, expected. NeitherAT END nor NOT AT END are concerned about that.

Your program could have known you had read a short, or a long, record, if you had used the FILE STATUS.

How are there six output records if there are only five WRITE statements executed?

Well, because you have 35 bytes of data plus a "new line", because the new-line will only be stripped after the 40th byte, when you DISPLAY the data, you get two lines. On the file, there is one "record", but it has an embedded newline. Instead of using an editor which would show me the hex values, I used cat, so saw a "sixth record", and then a text editor, and again saw the sixth record.

Exactly why you saw an almost entire "sixth record" I don't know, but it is of historical interest. If you want to look at the OpenCOBOL source to attempt to discover why, you can find it in the Files section of the GnuCOBOL site.

With GnuCOBOL, you DISPLAY or WRITE the 40-byte field, with an embedded blank. The DISPLAY will always do a line-break on the embedded "null" value, which gave me the apparent 35-byte followed by four-byte records, the 40th byte (actually the 36th) is the "invisible" null.

The WRITE would not cause a line-break with the embedded null, until you use something to "look at" the file which is expecting the data to be text rather than binary.

The "problem" in GnuCOBOL is not a problem, it is the way DISPLAY works (which is expecting text data, not binary) or, if using WRITE, the way you then "view" the file.

The actual OpenCOBOL output you got is effectively a bug, but it cannot be reproduced in GnuCOBOL.

The GnuCOBOL output from your program with your data (last record 35 bytes of data) can be explained. The spaces I got we because of COBOL's "padding" when you move a shorter field to a longer field. READ ... INTO ... contains an implicit MOVE, so you get the padding.

If you had just used the record from the data-area under the FD, you'd have got more unpredictable results.

So it does make the question on-topic. I think. So the question should stay, as others will almost certainly have a similar problem at some time. The problem being the use of DISPLAY with non-text data, or viewing output files with text-only tools. Or does that last mean it would be off-topic with WRITE? :-)

Resolution is two-fold. Upgrade to GnuCOBOL. Use the FILE STATUS always, and always check the (correct) file-status field (one for each file is best) after each IO, and take some sensible action when something unexpected happens.

The file-status value for end-of-file is "10". I add 88s to the file-status fields, and always use the "10" for end-of-file checking. I never use the tangle of AT END/NOT AT END.

If you'd used Gilbert's suggestion, with the priming read, and without the INTO I think you'd have got different results, which would have assisted resolution. The priming read (always have a current record before entering your read loop, then read the next record (or get end-of-file) as the last logical thing in the loop) is a more "COBOL" way to do things. As is the FILE STATUS and the 88s on the file-status fields, and the checking each time.

You could also look at using LINE SEQUENTIAL on the SELECT. This is a more natural type of file for Linux/Unix/Windows, "records" are then known to be delimited, and you'd have got valid records of 40, 40, 40, 40 and 35 bytes.

Then you have variable-length records, and you would need to know how to deal with those.

This type of question, a problem with the data, is normally off-topic at Stack Overflow.

However, the behaviour is not correct, there is reason to expect that you wouldn't get the final five bytes of the last record (it looked like you did originally, because you used READ INTO). However, an error with the data like that should not get your program to think there is an extra record.

I will raise the issue on the GnuCOBOL discussion pages, where (disclosure) I am a Moderator.

I would suggest you upgrade to GnuCOBOL 1.1.0, which has a number of bug-fixes over OpenCOBOL 1.1, and which is actively developed (GnuCOBOL is the new name for OpenCOBOL, so OpenCOBOL itself is no longer developed).

A couple of points on your code.

The structure suggested in a previous answer by Gilbert LeBlanc,to this question, unjustly downvoted, is much better in a COBOL program.

Using an actual FILE STATUS on the SELECT statement for a file is much better than using AT END and its relatives. You test then test (with an 88 preferably) the file status after each IO, so you can identify a problem as soon as it has happened. I will test whether this would have assisted in this instance anyway.

Disagree, as problem with the data wasn't an initial condition of the question. — Alvaro, Oct 14 '15 at 21:21
Disagree about what? If the only problem was the data, the question would be off-topic. Because that was not the only problem, the question was on-topic. Or are you saying the data was OK, in which case the problem cannot be reproduced with GnuCOBOL. — Bill Woodger, Oct 14 '15 at 21:24
I'm saying that the correct structure of the data the program expects is on-topic. Even if the logic is right, I'm feeding an unexpected format. Explain why it doesn't belong in a programming site, if you must. — Alvaro, Oct 14 '15 at 21:25
If you screw up your test-date, you should be able to sort that out yourself. It is not going to help anyone else (they won't have your data). Fix the data, no programming problem. It is the same as a typo. If you want confirmation of this (I can be wrong, of course) you can ask a question on the Meta site (from your profile, select the Meta icon). I've voted to close questions on data-typos before. I would hope that if your data had showed up totally scrambled, you'd have looked at your file and fixed an error. It's just that in this case, there seems to be something wrong happening. — Bill Woodger, Oct 14 '15 at 21:40
We think differently, that's all. On a side note: I downvoted the other answer because it was of poor quality, nothing else. Just posting code isn't going to help anyone, especially if the changes seem trivial and you don't explain why you reprashed it (as an example, your comment about the 88 values contains way more info than his answer). — Alvaro, Oct 14 '15 at 21:45
OK, it's an answer now. I do think you were too quick to dismiss and downvote Gilbert. C'est la vie. — Bill Woodger, Oct 15 '15 at 00:07

score 0 · Answer 2 · answered Jul 30 '21 at 18:41

0

In 002-Read, the File Status is set at the end of file. It is not checked until the end of the paragraph so whatever is in the buffer will be printed thus the askew last record twice.

if you modify your code to READ INPUTFILE AT END MOVE 1 TO FILE-STATUS ELSE DISPLAY INPUTRECORD END-READ.

answered Jul 30 '21 at 18:41

Jay

1

`ELSE` requires an `IF`. Where is the `IF` in the code you posted? – Rick Smith Jul 31 '21 at 01:59

COBOL: Simple file reading issue

2 Answers2