6

I would like to stop Cassandra from dumping hprof files as I do not require the use of them.

I also have very limited disk space (50GB out of 100 GB is used for data), and these files swallow up all the disk space before I can say "stop".

How should I go about it?

Is there a shell script that I could use to erase these files from time to time?

palacsint
  • 28,416
  • 10
  • 82
  • 109
Salocin.TEN
  • 474
  • 1
  • 6
  • 18

3 Answers3

7

It happens because Cassandra starts with -XX:+HeapDumpOnOutOfMemoryError Java option. Which is good stuff if you want to analyze. Also, if you are getting lots of heap-dump that indicate that you should probably tune the memory available to Cassandra.

I haven't tried it. But to block this option, comment the following line in $CASSANDRA_HOME/conf/cassandra-env.sh

JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError"

Optionally, you may comment this block as well, but not really required, I think. This block is available in 1.0+ version I guess. I can't find this in 0.7.3.

# set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR
if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
    JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$$.hprof"
fi

Let me know if this worked.


Update

...I guess it is JVM throwing it out when Cassandra crashes / shuts down. Any way to prevent that one from happening?

If you want to disable JVM heap-dump altogether, see here how to disable creating java heap dump after VM crashes?

Community
  • 1
  • 1
Nishant
  • 54,584
  • 13
  • 112
  • 127
  • I was thinking of commenting it out too. But the hprof files I noticed did not follow the template "cassandra-NNN.hprof" but rather just "pid-XXX.hprof" so I guess it is JVM throwing it out when Cassandra crashes / shuts down. Any way to prevent that one from happening? – Salocin.TEN Feb 02 '12 at 07:39
  • Commenting out the block in $CASSANDRA_HOME/conf/cassandra-env.sh did not really work. But thanks to the link and the .hprof removal cron job, everything is working fine now. Thanks for the help once again. – Salocin.TEN Feb 02 '12 at 23:24
1

Even if you update cassandra-env.sh to point to the heapdump path it will still not work. The reason was that from the upstart script /etc/init.d/cassandra there is this line which creates the default HeapDump path

start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p "$PIDFILE" -- \
    -p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null || return 2

I'm not an upstart expert but what I did was just removed the param which creates the duplicate. Another weird observation also when checking cassandra process via ps aux you'll notice that you'll see some parameters being written twice. If you source cassandra-env.sh and print $JVM_OPTS you'll notice those variables okay.

Superpaul
  • 71
  • 5
1

I'll admit i haven't used Cassandra, but from what i can tell, it shouldn't be dumping any hprof files unless you enable it at compile time, or the program experiences an OutofMemoryException. So try looking there.

in terms of a shell script, if the files are being dumped to a specific location you can use this command to delete all *.hprof files.

find /my/location/ -name *.hprof -delete

this is using the -delete directive from find that deletes all files that match the search. Look at the man page for find for more search options if you need to narrow it down more.

You can use cron to run a script at a given time, which would satisfy your "time to time" requirement, most linux distros have a cron installed, and work off of a crontab file. You can find out more about the crontab by using man crontab

Aatch
  • 1,846
  • 10
  • 19
  • Thanks! I have figured that out too! Because actually the hprof files are coming out when I suspended the Cassandra instance, as they are named "pid-XXX.hprof" instead of "cassandra-XXX.hprof" Thanks for the shell script. I shall implement them. :) – Salocin.TEN Feb 02 '12 at 07:28