What's the easiest way to enable HDFS Caching on EMR ?
More specifically, how to set dfs.datanode.max.locked.memory
and increase the "maximum size that may be locked into memory" (ulimit -l
) on all nodes ?
The following code seems to work fine for dfs.datanode.max.locked.memory
and I could probably write a custom bootstrap to update /usr/lib/hadoop/hadoop-daemon.sh
and call ulimit
. Is there any better or faster way ?
elastic-mapreduce --create \
--alive \
--plain-output \
--visible-to-all \
--ami-version 3.1.0 \
-a $access_id \
-p $private_key \
--name "test" \
--master-instance-type m3.xlarge \
--instance-group master --instance-type m3.xlarge --instance-count 1 \
--instance-group core --instance-type m3.xlarge --instance-count 10 \
--pig-interactive \
--log-uri s3://foo/bar/logs/ \
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
--args "--hdfs-key-value,dfs.datanode.max.locked.memory=2000000000" \