1

I have a general question about using Apache HBase with a RAMdisk. There is a big collection of data in a single table, about 25GB in total. With this data I am doing some basic aggregations, using a Java program.

As I have enough RAM avaiable I tried to put this data set into a RAMdisk using tmpfs:

mount -t tmpfs -o size=40G none /home/user/ramdisk

Then I stopped HBase, copied the content of the data folder into the RAMdisk. Finally I created a symbolic link, linking the old data directory to the new one and started HBase again.

It works, but when I process the aggregations now, It became slightly slower than before.

I could image of not having that much impact of using a RAMdisk, if HBase compresses the data (Snappy-compression is activated) and so on... but I can't guess why a faster medium would lead to a slower access of the data. There is enough available RAM left such that this cannot be the bottleneck.

Maybe someone has a general idea or insight about this?

fyaa
  • 646
  • 1
  • 7
  • 25
  • 1
    Maybe you misunderstand HBase. Since you can fill the data in RAM, a traditional database is a better option. – zsxwing Aug 20 '13 at 06:00
  • Although you say you have enough RAM, tmpfs uses swap. Who knows. Try `-t ramfs` and cross fingers. – Alfonso Nishikawa Aug 21 '13 at 11:05
  • @zsxwing I don't talk about traditional database systems, I want to understand this phenomenon. Maybe you have an idea about that. – fyaa Aug 26 '13 at 15:35
  • @AlfonsoNishikawa I don't change the size of the data, I'm only reading. And 25GB should fit into the 40GB even with some additional files without swapping I guess. – fyaa Aug 26 '13 at 15:37
  • "Who knows". If you give it a shot remember to comment :) Other thing could be cache pollution. Your case is weird, that's is for real ;) – Alfonso Nishikawa Aug 27 '13 at 11:08
  • @fyaa did you able to solve this program, i am also having similar issue, where ram disk doesn't boost the performance – i0707 Apr 28 '16 at 13:35

1 Answers1

1

I think it's going to be one of two things: A: Do you really have more than 40G of free ram before allocating the disk ? I'm impressed & all if you actually had that much free, but seeing ram free afterwards isn't an indicator that you didn't just use a big chunk of swap.

B: compression (even something fast like snappy) is going to hurt performance... particularly for something like a database engine that has a lot of wacky optimization in it. You're right that a ramdisk should be ludicrously faster, but it having to jump all over your database queries, and then having to jump all over the compressed image to decompress chunks, has to have a pretty big overhead.

pacifist
  • 712
  • 4
  • 13