1

Instead of using hadoop-lzo to index my LZO input file, I decided to simply split it into a chunks, which compressed with LZO would be close to 128MB (since it is default block size on Amazon Distribution[1]).

Is there anything wrong (from cluster performance perspective) to provide input already split and compressed to a size close to a default HDFS block size?

spacemonkey
  • 19,664
  • 14
  • 42
  • 62

0 Answers0