2

I need to extract information from a COBOL program. I'm using the ANTLR grammar for COBOL. I need to extract group variables as a whole. I'm not able to extract this with ANTLR as the parser extracts every variable subdivision/group item as an individual element. I need somehow to get the group items as a bunch. I'm new to COBOL, so I want to get an understanding of how the compiler understands which elements to include in a group, and where to stop.

EX:

  01 EMPREC.
  02 EEMPNAME.
  10 FIRSTNAME PIC X(10)
  10 LASTNAM PIC X(15)

  07 SNO PIC X(15)

Is the above definition valid? Will the compiler include all elements(=>2 and <=49) after the first item (01 EMPREC), in the group EMPREC until it encounters another 01 or 77 ? Is this safe to assume? Is the level information enough to derive what elements fall under a group? Any pointers is appreciated.

Sahana
  • 53
  • 1
  • 6

4 Answers4

4

I am the author of the COBOL ANTLR4 grammar you found in the ANTLR4 grammars project. The COBOL grammar generates only an Abstract Syntax Tree (AST).

In contrast, what you ask for is an Abstract Semantic Graph (ASG), which represents grouping of variables and in general relationships between AST elements.

Such an ASG is generated by the COBOL parser at my proleap-cobol-parser project. This project uses the mentioned COBOL grammar and resolves relationships between AST elements.

An example for parsing data description entries can be found in this unit test.

u.wol
  • 153
  • 4
  • Thanks for pointing that out. That was helpful. Currently I'm using the level number information to group data items into a group. Thanks again. – Sahana Dec 01 '16 at 07:15
3

You actually had two questions:

"Is the [...] definition valid?" No it is not as you have no previous level 07. If you change the level of EEMPNAME to 07 or SNO to 02 it is valid. Group items may have a USAGE clause but no PICTURE.

This leads to the question "I want to get an understanding of how the compiler understands which elements to include in a group, and where to stop".

You need to store the level number together with the variable. If you want to know what is part of the group then you need to check this level and all below. If you want to check the complete level 02 group use only the variables with an higher level number below until you get to the next level 02 or a higher level (in this case 01), if you want the Depending on your needs you additional need to check if the next variable with the same level has a REDEFINES in, in this case it belongs to the same group (storage-wise). Similar applies to level 66 (renames, doesn't have its own storage).

Level 88 has no storage either, it is just for validation entries depending on the parsing you want to do you can ignore them. Important: level 88 does not create a sub-item, you can have multiple ones and a lower level number afterwards.

The level numbers that always defines a new item are 01, and with extensions 66, 77 and 78.

01 vargroup.
   02 var-1  pic 9.
      88  var-is-even  values 0, 2, 4 6 8   . 
      88  var-is-not-even  values 1 3 5 7 9. 
      88  var-is-big   value 6 thru 9.
   02 var-2  pic x.
   01 new-var pic x.
   77 other-var  pic 9.

I suggest to read some COBOL sources and come up with a new question, if necessary. For example CBL_OC_DUMP.

Simon Sobisch
  • 6,263
  • 1
  • 18
  • 38
  • Ok. So you're saying the level numbers should be enough to determine if a data item belongs to a group ? i.e, all dataitems with level number between 2 and 49 (with valid numbering) , following level 01 will belong to a single group. And the next new variable can have a level number of 01, 77, 66 or 88 only? – Sahana Nov 25 '16 at 12:42
  • Seems to be there is another error : the level 02 groups 2 level 10 below, but is still defined as a PIC X. My memory is fading, but I think I remember it was not possible. – gazzz0x2z Nov 25 '16 at 13:27
  • Thanks for the detailed explanation. I needed somebody to tell me this "The level numbers that always defines a new item are 01, and with extensions 66 and 77." :) I'm currently using the level number information to group data items into a group. If the level information is enough, then the logic I have should do. – Sahana Dec 01 '16 at 07:11
1

I suspect you are going to need to put some additional code behind your ANTLR parser. If you tokenize each individual item, then keeping up with a stack of group items is somewhat easy. However, trying to grab the entire group item as a single production will be very hard.

Some of the challenges that ANTLR will not be up to are 1) group items can contain group items; 2) group items can redefine other items, or be redefined; 3) the little used, but very complicating level-66 renames clause.

If you treat each numbered data definition as a separate production, and maintain a stack, pushing for new items, popping once you have completed processing an item, and knowing that you have completed a group once you see the same level number again, your life will be easier.

Joe Zitzelberger
  • 4,238
  • 2
  • 28
  • 42
0

It is quite a while now since I've done COBOL, but there are quite a lot of issues if my memory serves me correctly.

1) 01 levels always start in column 8. 2) When assigning subsiquent levels you are better off incrementing my +5 01 my-record. 05 my-name pic x(30) value spaces. 05 my-address1 pic x(40) value spaces. 3) 77 levels I thought are now obsolete since they are not an efficeint use of memory. Also when 77 levels are used they should always be defined at the start of the working storage section. Obviously record layouts are defined in file section unless using write from and read into?

4) If you are defining lots of new-var pic x. Don't use new 01 levels for each!

01 ws-flages. 05 ws_flag1 pic x value space. 05 ws_flag2 pic x value space.

etc.

For COBOL manuals try Stern & Stern.

Hope this helps!

Mike
  • 1
  • 4