We are attempting to write a tool in Perl which is expected to parse a fixed length EBCDIC data file and generate the record layout by looking at the hex value of each byte in the record.
It is assumed that each data file, which is written by a Cobol program whose source code we do not have, can have multiple record layouts. The aim of this tool is to perform data migration (EBCDIC to ASCII) by generating layout which would then be fed to a converter.
The problem is that there are hundreds of permutations and combinations that may arise with each byte. I thought that comparing the hex value of the corresponding byte in the record below the current one might give us some clue as to what this might be. But even in this case there is no concrete solution that one might arrive at. Decisions need to be taken at every juncture which might affect the end result.
Could someone please let me know for any said patterns that I can look for? For example, for all COMP-3s each nibble can possibly represent a value from 0-9 and hence the hex value of the byte might be something like, [0-9][0-9]. Essentially for data migration one need not bother about COMPs and COMP-3s as their value would not be affected in the migration. But identifying what is the DISPLAY fields are is also turning out to be a huge task. Can someone throw some ideas or point me in some direction that I can further explore?
Any help would be highly appreciated. I am really stuck in a mire here.
Thanks, Aditya.