Hadoop InputFormat for Excel

Question

I need to create a map-reducing program which reads an Excel file from HDFS and does some analysis on it. From there store the output in the format of excel file. I know that TextInputFormat is used to read a .txt file from HDFS but which method or which inputformat should I have to use?

What is the use case? Is this a single input file? What is it's size? Do you use it in the excel application, or do you just use the format? Working with excel is suitable for relatively small files Working with hadoop is suitable for very large datasets — Ophir Yoktan, Feb 17 '14 at 06:52
I need to retrieve only one Excel fiel from HDFS . The size of the file is 1913 KB. i need to process this file in pseudo distributed single mode cluster. — Surender Raja, Feb 17 '14 at 06:56
Can we read this excel file directly from hadoop cluster . What is the the inputformat type that i need to use in Job configuration — Surender Raja, Feb 17 '14 at 09:18

score 0 · Accepted Answer · edited May 23 '17 at 11:44

0

Generally, hadoop is overkill for this scenario, but some relevant solutions

parse the file externally and convert to an hadoop compatible format
read the complete file as a single record see this answer
use two chained jobs. the 1st like in 2, reads the file in bulk, and emits each record as input for the next job.

edited May 23 '17 at 11:44

Community

1
1

answered Feb 17 '14 at 07:28

Ophir Yoktan

8,149
7
58
106

Hadoop InputFormat for Excel

1 Answers1