Manually splitting and compressing input for Amazon EMR

Asked May 13 '13 at 23:02

Active May 13 '13 at 23:02

Viewed 128 times

Instead of using hadoop-lzo to index my LZO input file, I decided to simply split it into a chunks, which compressed with LZO would be close to 128MB (since it is default block size on Amazon Distribution[1]).

Is there anything wrong (from cluster performance perspective) to provide input already split and compressed to a size close to a default HDFS block size?

asked May 13 '13 at 23:02

spacemonkey

19,664
14
42
62

Manually splitting and compressing input for Amazon EMR

0 Answers0