How do I convert EBCDIC to TEXT using Hadoop Mapreduce

Question

I need to parse an EBCDIC input file format. Using Java, I am able to read it like below:

InputStreamReader rdr = new InputStreamReader(new FileInputStream("/Users/rr/Documents/workspace/EBCDIC_TO_ASCII/ebcdic.txt"), java.nio.charset.Charset.forName("ibm500"));

But in Hadoop Mapreduce, I need to parse via RecordReader which has not worked so far.

Can any one provide a solution to this problem?

You need to convert it to ASCII, and you're already doing that. Your question remains obscure. — user207421, Jan 19 '16 at 06:30

score 2 · Answer 1 · answered Aug 22 '18 at 19:26

2

You can try to parse it through Spark, maybe, by using Cobrix which is an open-source COBOL data source for Spark.

answered Aug 22 '18 at 19:26

Felipe Martins Melo

1,323
11
15

score 0 · Answer 2 · answered Jan 19 '16 at 06:00

0

The best thing you can do is to convert data to ASCII first and then load to HDFS.

answered Jan 19 '16 at 06:00

Ali

1,605
1
13
19

score 0 · Answer 3 · answered Jan 19 '16 at 07:14

Why is the file in EBCDIC ???, does it need to be ???

If it is just Text data, why not convert it to ascii when you send / pull the file from the Mainframe / AS400 ???.

If the file contains binary or Cobol numeric fields then you have several options

Convert the file to normal Text on the mainframe (The Mainframe Sort utility is good at this), then send the file and convert it (to ascii) .
If it is a Cobol file, There are some open source projects you could look at https://github.com/tmalaska/CopybookInputFormat or https://github.com/ianbuss/CopybookHadoop
There are commercial packages for loading mainframe-Cobol data into hadoop.

How do I convert EBCDIC to TEXT using Hadoop Mapreduce

3 Answers3