0

I have a EBCDIC coded mainframe file which I need to convert to an ASCII format. Which libraries/tools can I use to do that. I am most familiar with Python.

The file I received has a cookbook with it, which can be used to parse the file (part of it is below).

What do types: 'C', 'P' and 'B' mean? I'm guessing C = character, B = byte, P = packed number?

1:----------------------------------------------------------------------------------------------------------------------------------:
 :LAYOUT NAME:         B224E           DATE:    02/20/14         PAGE   7 OF  14:
 :                     -------                  --------              ---    ---:
 :COBOL:  PAN-NAME: NONE                 COPYLIB-NAME: RECB224E                 :
 :                  --------------------               --------------------     :
 :BAL  :  PAN-NAME: NONE                 COPYLIB-NAME: NONE                     :
 :------------------------------------------------------------------------------:
 :TYPE OF RECORD:  EXTENDED SORT KEY AREA - SEGMENT "A"  (OPTIONAL)             :
 :------------------------------------------------------------------------------:
 :POSITION  : LENGTH : TYPE :   DESCRIPTION                                     :
 :----------:--------:------:---------------------------------------------------:
 :          :        :      :                                                   :
 :          :        :      :                                                   :
 :          :        :      :                                                   :
 :001 - 001 :    1   :   C  :  SEGMENT IDENTIFIER - "A"                         :
 :          :        :      :                                                   :
 :002 - 003 :    2   :   P  :  SEGMENT LENGTH                                   :
 :          :        :      :                                                   :
 :004 - ??? :   ???  :   C  :  EXTENDED SORT KEY AREA                           :
 :          :        :      :                                                   :
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
user2346491
  • 39
  • 1
  • 1
  • 2
  • 5
    This is a silly idea, which is not going to work if you have packed-decimal and binary fields. Look at the recent questions tagged `ebcdic` for more details. Don't do it. Don't be fobbed-off by the people giving you the file. They should give you the file in text-only, and the file-transfer process should do the conversion. Anything else should fail an audit. Auditor: "So, you receive a data file and then change it before doing anything with it?" You: "Yep, and I picked up some random code off the internet to do it as well". Auditor removes Big Red Marker-Pen, draws an A4-sized `X` on page. – Bill Woodger Sep 07 '14 at 08:32
  • Yes it is going to work, and as a species we have been doing this for over 30 years on a number of mixed architectures, notably for IBM hosts and Intel clients. The file must be mapped at a field level and conversion applied for each field. Sometimes this is referred to as a template. There are a number of ETL products out there that will do this at a consumer level. Notably "DataStage". You can do this from scratch with Python as the text and numeric fields should map from IBM037 or IBM500 to ascii easily. The binaries will generally be set sizes (halfword up). Bitwise arithmetic for P's. – mckenzm Dec 30 '14 at 23:48

1 Answers1

6

Take a look at the codecs module. From the standard encodings table, it looks like EBCDIC is also known as cp-500. Something like the following should work:

import codecs

with open("EBCDIC.txt", "rb") as ebcdic:
    ascii_txt = codecs.decode(ebcdic.read(), "cp500")
    print(ascii_txt)

As mpez0 noted in the comments, if you're using Python 3, you can condense the code to this:

with open("EBCDIC.txt", "rt", "cp500") as ebcdic:
    print(ebcdic.read())

Not having an EBCDIC file handy, I can't test this, but it should be enough to get you started.

MattDMo
  • 100,794
  • 21
  • 241
  • 231
  • 2
    With Python3, one can do open("EBCDIC.txt", "rt", encoding="cp500"). Also, the docs don't have a dash or underscore in "cp500". – mpez0 Sep 06 '14 at 18:01
  • @mpez0 oops, my bad, I'll fix it. – MattDMo Sep 06 '14 at 18:11
  • The file is not text. The example contains a packed field, and there seem to be other records as probable binary data is referred to.This will trash the file. – Bill Woodger Sep 07 '14 at 08:27
  • This is a very old question but just for correctness sake, I get a `TypeError` as ebcdic is a file type, not a string when running the first code snippet. However, changing the argument of decode to `ebcdic.read()` fixes the issue. Maybe it will help someone to know. – Bennet Leff Jun 11 '18 at 18:24
  • As you guess `P` stands for [packed decimal](http://www.simotime.com/datapk01.htm). The [struct package](https://docs.python.org/3/library/struct.html) should be helpful to read them byte by byte. The `C` are characters you can decode to Unicode using EBCDIC though there are many local variants. cp500 might be a good starting point. More EBCDIC codecs are available from the [ebcdic package](https://pypi.org/project/ebcdic/) on PyPI. Your file looks like a variable record length VSAM. You might find more information about that on IBM's web site though the tend to move and remove pages a lot. – roskakori Jun 05 '19 at 23:54
  • Were you able to read ? I have similar task to do – Ramandeep Mehmi Jul 06 '22 at 22:11