I just started with Camus.
I am planning to run Camus, every one hour. We get around ~80000000
messages every hour and average message size is 4KB
(we have a single topic in Kafka).
I first tried with 10
mappers, it took ~2hours to copy one hour's data and it created 10 files with ~7GB size.
Then I tried 300
mappers, it brought down the time to ~1 hour. But it created 11 files. Later, I tried with 150
mappers and it took ~30 minutes.
So, how do I choose the number of mappers in this? Also, I want to create more files in hadoop as one size is growing to 7GB. What configuration do I have to check?