1

looking at the HDFS sink default parameters in Apache Flume it seems that this will produce tons of very small files (1 kB rolls). From what I learned about GFS/HDFS is that blocksizes are 64MB and filesizes should rather be gigabytes to make sure everything runs efficiently.

So I'm curious whether the default parameters of Flume are just misleading or whether I missed something else here.

Cheers.

pagid
  • 13,559
  • 11
  • 78
  • 104
  • 1
    I was able to fill my 350**Gb** HDFS with only ~30-50**Mb** of real data just because of small files produced by Flume :) I believe documentation is not misleading, but it just doesn't describe all the caveats you should be aware of. – ffriend Aug 20 '13 at 11:13

0 Answers0