COBOL .csv File IO into Table Not Working

Question

I am trying to learn Cobol as I have heard of it and thought it would be fun to take a look at. I came across MicroFocus Cobol, not really sure if that is pertinent to this post though, and since I like to write in visual studio it was enough incentive to try and learn it.

I've been reading alot about it and trying to follow documentation and examples. So far I've gotten user input and output to the console working so then I decided to try file IO out. That went ok when I was just reading in a 'record' at a time, I realize that 'record' may be incorrect jargon. Although I've been programming for a while I am an extreme noob with cobol.

I have a c++ program that I have written before that simply takes a .csv file and parses it then sorts the data by whatever column the user wants. I figured it wouldn't be to hard to do the same in cobol. Well apparently I have misjudged in this regard.

I have a file, edited in windows using notepad++, called test.csv which contains:

4001942600,140,4
4001942700,141,3
4001944000,142,2

This data is from the us census, which has column headers titled: GEOID, SUMLEV, STATE. I removed the header row since I couldn't figure out how to read it in at the time and then read in the other data. Anywho...

In Visual Studio 2015, on Windows 7 Pro 64 Bit, using Micro Focus, and step debugging I can see in-record containing the first row of data. The unstring works fine for that run but the next time the program 'loops' I can step debug, and view in-record and see it contains the new data however the watch display when I expand the watch elements looks like the following:

        REC-COUNTER 002 PIC 9(3) 
+       IN-RECORD   {Length = 42} : "40019427004001942700             000      "    GROUP
-       GEOID   {Length = 3}    PIC 9(10) 
        GEOID(1)    4001942700  PIC 9(10) 
        GEOID(2)    4001942700  PIC 9(10) 
        GEOID(3)    <Illegal data in numeric field> PIC 9(10) 

-       SUMLEV  {Length = 3}    PIC 9(3) 
        SUMLEV(1)   <Illegal data in numeric field> PIC 9(3) 
        SUMLEV(2)   000 PIC 9(3) 
        SUMLEV(3)   <Illegal data in numeric field> PIC 9(3) 

-       STATE   {Length = 3}    PIC X
        STATE(1)        PIC X
        STATE(2)        PIC X
        STATE(3)        PIC X

So I'm not sure why that just before the Unstring operation the second time around I can see the proper data, but after the unstring happens incorrect data is then stored in the 'table'. What is also interesting is that if I continue on the third time around the correct data is stored in the 'table'.

         identification division.
         program-id.endat.
         environment division.
         input-output section.
         file-control.
             select in-file assign to "C:/Users/Shittin Kitten/Google Drive/Embry-Riddle/Spring 2017/CS332/group_project/cobol1/cobol1/test.csv"
                organization is line sequential.
         data division.     
         file section.
         fd in-file.  
         01 in-record.
             05 record-table.
                 10 geoid     occurs 3 times        pic 9(10).
                 10 sumlev   occurs 3 times       pic 9(3).
                 10 state       occurs 3 times       pic X(1).
         working-storage section.
    01 switches.
     05 eof-switch pic X value "N".
  *  declaring a local variable for counting
    01 rec-counter pic 9(3).
  *  Defining constants for new line and carraige return. \n \r DNE in cobol!
    78 NL  value X"0A".
    78 CR  value X"0D".
    78 TAB value X"09".

  ******** Start of Program ******
   000-main.
     open input in-file.
       perform 
       perform 200-process-records
         until eof-switch = "Y".
       close in-file;
     stop run.
  *********** End of Program ************

  ******** Start of Paragraph  2 *********
   200-process-records.
       read in-file into in-record
         at end move "Y" to eof-switch
         not at end compute rec-counter = rec-counter + 1;
       end-read.
       Unstring in-record delimited by "," into 
           geoid in record-table(rec-counter), 
           sumlev in record-table(rec-counter), 
           state in record-table(rec-counter).

     display "GEOID  " & TAB &">> " & TAB & geoid of record-table(rec-counter).
     display "SUMLEV  >> " & TAB & sumlev of record-table(rec-counter).
     display "STATE  "  & TAB &">> " & TAB & state of record-table(rec-counter) & NL.
  ************* End of Paragraph 2  **************

I'm very confused about why I can actually see the data after the read operation, but it isn't stored in the table. I have tried changing the declarations of the table to pic 9(some length) as well and the result changes but I can't seem to pinpoint what I'm not getting about this.

score 1 · Answer 1 · edited May 23 '17 at 12:34

Well I figured it out. While step debugging again, and hovering the mouse over record-table I noticed 26 white spaces present after the last data field. Now earlier tonight I attempted to change this data on the 'fly' as it were, because normally visual studio allows this. I attempted to make the change but did not verify that it took, normally I don't have to, but apparently it did not take. Now I should have known better since the icon displayed to the left of record-table displays a little closed pad-lock.

I normally program C, C++, and C# so when I see the little pad lock it usually has something to do with scoping and visibility. Not knowing COBOL well enough I overlooked this little detail.

Now I decided to unstring in-record delimited by spaces into temp-string. just prior to the

   Unstring temp-string delimited by "," into 
       geoid in record-table(rec-counter), 
       sumlev in record-table(rec-counter), 
       state in record-table(rec-counter).

The result of this was the properly formatted data, at least as I understand it, stored into the table and printed to the console screen.

Now I have read that the unstring 'function' can utilize multiple 'operators' such as so I may try to combine these two unstring operations into one.

Cheers!

**** Update ****

I have read the Mr. Woodger's reply below. If I could ask for a bit more assistance with this. I have also read this post which is similar but above my level at this time. COBOL read/store in table

That is pretty much what I'm trying to do but I don't understand some of things Mr. Woodger is trying to explain. Below is the code a bit more refined with some questions I have as comments. I would very much like some assistance with this or maybe if I could have an offline conversation that would be fine too.

`identification division.
  * I do not know what 'endat' is
         program-id.endat. 
         environment division.
         input-output section.
   file-control.
  * assign a file path to in-file
             select in-file assign to "C:/Users/Shittin Kitten/Google Drive/Embry-Riddle/Spring 2017/CS332/group_project/cobol1/cobol1/test.csv"
  *  Is line sequential what I need here?  I think it is
                organization is line sequential.
  *  Is the data devision similar to typedef in C?   
         data division.
  *  Does the file sectino belong to data division?
         file section.
  * Am I doing this correctly?  Should this be below?
         fd in-file.  
  * I believe I am defining a structure at this point
   01 in-record.
      05 record-table.
                 10 geoid     occurs 3 times        pic A(10).
                 10 sumlev   occurs 3 times       pic A(3).
                 10 state       occurs 3 times       pic A(1).
  * To me the working-storage section is similar to ADA declarative section
  *  is this a correct analogy?
         working-storage section.
  * Is this where in-record should go?  Is in-record a representative name?
    01 eof-switch pic X value "N".
    01 rec-counter pic 9(1).
  *  I don't know if I need these 
    78 NL  value X"0A".
    78 TAB value X"09".
    01 sort-col pic 9(1).
  ********************************* Start of Program ****************************
        *Now the procedure division, this is alot like ada to me
         procedure division.
  * Open the file
     perform 100-initialize.
  *  Read data
       perform 200-process-records
  *  loop until eof
         until eof-switch = "Y".
  *  ask user to sort by a column    
     display "Would which column would you like to bubble sort? " & TAB.
  *  get user input
     accept sort-col.
  * close file
     perform 300-terminate.
  * End program
   stop run.
  ********************************* End of Program ****************************

  ******************************** Start of Paragraph 1  ************************
     100-initialize.
       open input in-file.
  *   Performing a read, what is the difference in this read and the next one
  *   paragraph 200?  Why do I do this here instead of just opening the file?
       read in-file 
         at end
           move "Y" to eof-switch
         not at end
  *       Should I do this addition here? Also why a semicolon?
           add 1 to rec-counter;
       end-read.
  *    Should I not be unstringing here?
       Unstring in-record delimited by "," into geoid of record-table, 
                       sumlev of record-table, state of record-table.
  ******************************** End of Paragraph 1  ************************

  ********************************* Start of Paragraph  2 **********************
   200-process-records.

       read in-file into in-record
         at end move "Y" to eof-switch
         not at end add 1 to rec-counter;
       end-read.

  *   Should in-record be something else?  I think so but don't know how to
  *   declare and use it
       Unstring in-record delimited by ","  into 
           geoid in record-table(rec-counter), 
           sumlev in record-table(rec-counter), 
           state in record-table(rec-counter).

  *  These lines seem to give the printed format that I want
     display "GEOID  " & TAB &">> " & TAB & geoid of record-table(rec-counter).
     display "SUMLEV  >> " & TAB & sumlev of record-table(rec-counter).
     display "STATE  "  & TAB &">> " & TAB & state of record-table(rec-counter) & NL.

  ********************************* End of Paragraph 2  ************************    

  ********************************* Start of Paragraph 3  ************************
   300-terminate.
     display "number of records >>>> " rec-counter;
     close in-file;
  **************************** End of Paragraph 3  *****************************

`

How about going to the GnuCOBOL discussion area at SourceForge. Get yourself a SourceForge ID. Much better suited to "discursive" topics like this, and lots of COBOL welcome there. — Bill Woodger, Apr 19 '17 at 07:12

score 1 · Accepted Answer · answered Apr 18 '17 at 16:55

I think there are a few things you've not grasped yet, and which you need to.

In the DATA DIVISION, there are a number of SECTIONs, each of which has a specific purpose.

The FILE SECTION is where you define data structures which represent data on files (input, output or input-output). Each file has an FD, and subordinate to an FD will be one or more 01-level structures, which can be extremely simple, or complex.

Some of the exact behaviour is down to particular implementation for a compiler, but you should treat things this way, for your own "minimal surprise" and for the same of anyone who has to later amend your programs: for an input file, don't change the data after a READ, unless you are going to update the record (of if you are using a keyed READ, perhaps). You can regard the "input area" as a "window" on your data-file. The next READ, and the window is pointed to a different position. Alternatively, you can regard it as "the next record arrives, obliterating what was there previously". You have put the "result" of your UNSTRING into the record-area. The result will for sure disappear on the next read. You have the possibility (if the window is true for your compiler, and depending on the mechanism it uses for IO) of squishing the "following" data as well.

Your result should be in the WORKING-STORAGE, where it will remain undisturbed by new records being read.

READ filname INTO data-description is an implicit MOVE of the data from the record-area to data-description. If, as you have specified, data-description is the record-area, the result is "undefined". If you only want the data in the record-area, just a plain READ filename is all that is needed.

You have a similar issue with your original UNSTRING. You have the source and target fields referencing the same storage. "Undefined" and not the result you want. This is why the unnecessary UNSTRING "worked".

You have a redundant inline PERFORM. You process "something" after end-of-file. You make things more convoluted by using unnecessary "punctuation" in the PROCEDURE DIVISION (which you've apparently omitted to paste). Try using ADD instead of COMPUTE there. Look at the use of FILE STATUS, and of 88-level condition-names.

You don't need a "new line" for DISPLAY, because you get one for free unless you use NO ADVANCING.

You don't need to "concatenate" in the DISPLAY, because you get that for free as well.

DISPLAY and its cousin, ACCEPT, are the verbs (only intrinsic functions are functions in COBOL (except where your compiler supports user-defined functions)) which vary the most from compiler to compiler. If your complier supports SCREEN SECTION in the DATA DIVISION you can format and process user-input in "screens". If you were to use IBM's Enterprise COBOL you'd have very basic DISPLAY/ACCEPT.

You "declare a local variable". Do you? In what sense? Local to the program.

You can pick up quite a lot of tips by looking at COBOL questions here from the last few years.

Mr. Woodger, Could you please read my updated post above. Are you available for an offline conversation if this post gets off topic? — Michael Riley, Apr 19 '17 at 03:20

COBOL .csv File IO into Table Not Working

2 Answers2