0

i want to use Centralized Cache in hadoop-2.3.

here is my steps. (10 nodes, every node 6g memory)

1.my file(45M) to be cached

[hadoop@Master ~]$ hadoop fs -ls /input/pics/bundle
Found 1 items
-rw-r--r--   1 hadoop supergroup   47185920 2014-03-09 19:10 /input/pics/bundle/bundle.chq

2.create cache pool

[hadoop@Master ~]$ hdfs cacheadmin -addPool myPool -owner hadoop -group supergroup 
Successfully added cache pool myPool.
[hadoop@Master ~]$ hdfs cacheadmin -listPools -stats  
Found 1 result.
NAME    OWNER   GROUP       MODE            LIMIT  MAXTTL  BYTES_NEEDED  BYTES_CACHED  BYTES_OVERLIMIT  FILES_NEEDED  FILES_CACHED
myPool  hadoop  supergroup  rwxr-xr-x   unlimited   never             0             0                0             0             0

3.addDirective

[hadoop@Master ~]$ hdfs cacheadmin -addDirective -path /input/pics/bundle/bundle.chq -pool myPool -force -replication 3 
Added cache directive 2

4.listDirectives

[hadoop@Master ~]$ hdfs cacheadmin -listDirectives -stats -path /input/pics/bundle/bundle.chq -pool myPool
Found 1 entry
ID POOL     REPL EXPIRY  PATH                            BYTES_NEEDED  BYTES_CACHED  FILES_NEEDED  FILES_CACHED
2 myPool      3 never   /input/pics/bundle/bundle.chq      141557760             0             1             0

the BYTES_NEEDED is right, but BYTES_CACHED is zero. It seems that the size has been calculated but the cache action which puts file into memory has not been done.So how to cache my file into memory. Thank you very much.

Hellen
  • 3,472
  • 5
  • 18
  • 25

2 Answers2

0

There were a bunch of bugs we fixed in Hadoop 2.3. I would recommend using at least Hadoop 2.4 to use HDFS caching.

To get into more detail I would need to see the log messages.

cmccabe
  • 4,160
  • 1
  • 24
  • 10
  • thank you. upgrading now, i find files are cached into memory after about 10min on hadoop-2.3. but it is strange that the performance of my program hasn't been improved. I will do more tests later. – Hellen May 20 '14 at 07:23
0

Including the output of hdfs dfsadmin -report would also be useful, as well as ensuring that you have followed the setup instructions here (namely, increasing the ulimit and setting dfs.datanode.max.locked.memory):

http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html