0

I need to parse an EBCDIC input file format. Using Java, I am able to read it like below:

InputStreamReader rdr = new InputStreamReader(new FileInputStream("/Users/rr/Documents/workspace/EBCDIC_TO_ASCII/ebcdic.txt"), java.nio.charset.Charset.forName("ibm500"));

But in Hadoop Mapreduce, I need to parse via RecordReader which has not worked so far.

Can any one provide a solution to this problem?

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
Barath
  • 107
  • 2
  • 14

3 Answers3

2

You can try to parse it through Spark, maybe, by using Cobrix which is an open-source COBOL data source for Spark.

Felipe Martins Melo
  • 1,323
  • 11
  • 15
0

The best thing you can do is to convert data to ASCII first and then load to HDFS.

Ali
  • 1,605
  • 1
  • 13
  • 19
0

Why is the file in EBCDIC ???, does it need to be ???

If it is just Text data, why not convert it to ascii when you send / pull the file from the Mainframe / AS400 ???.

If the file contains binary or Cobol numeric fields then you have several options

  1. Convert the file to normal Text on the mainframe (The Mainframe Sort utility is good at this), then send the file and convert it (to ascii) .
  2. If it is a Cobol file, There are some open source projects you could look at https://github.com/tmalaska/CopybookInputFormat or https://github.com/ianbuss/CopybookHadoop
  3. There are commercial packages for loading mainframe-Cobol data into hadoop.
Bruce Martin
  • 10,358
  • 1
  • 27
  • 38