How to enable HDFS caching on Amazon EMR?

Question

What's the easiest way to enable HDFS Caching on EMR ?

More specifically, how to set dfs.datanode.max.locked.memory and increase the "maximum size that may be locked into memory" (ulimit -l) on all nodes ?

The following code seems to work fine for dfs.datanode.max.locked.memory and I could probably write a custom bootstrap to update /usr/lib/hadoop/hadoop-daemon.sh and call ulimit. Is there any better or faster way ?

elastic-mapreduce --create \
    --alive \
    --plain-output \
    --visible-to-all \
    --ami-version  3.1.0 \
    -a $access_id \
    -p $private_key \
    --name "test" \
    --master-instance-type m3.xlarge \
    --instance-group master --instance-type m3.xlarge  --instance-count 1 \
    --instance-group core --instance-type m3.xlarge --instance-count 10 \
    --pig-interactive \
    --log-uri s3://foo/bar/logs/ \
    --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
    --args "--hdfs-key-value,dfs.datanode.max.locked.memory=2000000000" \

How to enable HDFS caching on Amazon EMR?

0 Answers0