I have a general question about using Apache HBase with a RAMdisk. There is a big collection of data in a single table, about 25GB in total. With this data I am doing some basic aggregations, using a Java program.
As I have enough RAM avaiable I tried to put this data set into a RAMdisk using tmpfs:
mount -t tmpfs -o size=40G none /home/user/ramdisk
Then I stopped HBase, copied the content of the data folder into the RAMdisk. Finally I created a symbolic link, linking the old data directory to the new one and started HBase again.
It works, but when I process the aggregations now, It became slightly slower than before.
I could image of not having that much impact of using a RAMdisk, if HBase compresses the data (Snappy-compression is activated) and so on... but I can't guess why a faster medium would lead to a slower access of the data. There is enough available RAM left such that this cannot be the bottleneck.
Maybe someone has a general idea or insight about this?