Hadoop FS type you mentioned (org.apache.hadoop.hdfs.DistributedFileSystem) is just the interface and it fits your needs. Instead, Tachyon create the s3n FileSystem implementation basing on scheme specified in the uri of remote dfs which is configured with TACHYON_UNDERFS_ADDRESS.
For Amazon, you will need to specify something like this:
export TACHYON_UNDERFS_ADDRESS=s3n://your_bucket
Note "s3n", not "s3" here.
Additional setup you will need to work with s3 (see also
Error in setting up Tachyon on S3 under filesystem and http://tachyon-project.org/Setup-UFS.html):
in ${TACHYON}/bin/tachyon-env.sh: add key id and the secret key to TACHYON_JAVA_OPTS:
-Dfs.s3n.awsAccessKeyId=123
-Dfs.s3n.awsSecretAccessKey=456
Publish extra dependencies required by s3n Hadoop FileSystem implementation, the version depends on the version of Hadoop installed. These are : commons-httpclients-* and jets3t-*.
For that, publish the TACHYON_CLASSPATH as mentioned in one of links above. This can be done by adding export of TACHYON_CLASSPATH in ${TACHYON}/libexec/tachyon-config.sh before exporting CLASSPATH:
export TACHYON_CLASSPATH=~/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:~/.m2/repository/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar
export CLASSPATH="$TACHYON_CONF_DIR/:$TACHYON_JAR:$TACHYON_CLASSPATH":
Start Tachyon cluster:
./bin/tachyon format
./bin/tachyon-start.sh local
Check its availability via web interface:
http://localhost:19999/
in logs:
${TACHYON}/logs
Your core-site.xml should contain following sections to make sure you are integrated with Tachyon (see Spark reference http://tachyon-project.org/Running-Spark-on-Tachyon.html for configuration right from scala)
- fs.defaultFS - specify the Tachyon master host-port (below are defaults)
- fs.default.name - default name of fs, the same as before
- fs.tachyon.impl - Tachyon's hadoop.FileSystem implementation hint
- fs.s3n.awsAccessKeyId - Amazon key id
fs.s3n.awsSecretAccessKey - Amazon secret key
<configuration>
<property>
<name>fs.defaultFS</name>
<value>tachyon://localhost:19998</value>
</property>
<property>
<name>fs.default.name</name>
<value>tachyon://localhost:19998</value>
<description>The name of the default file system. A URI
whose scheme and authority determine the
FileSystem implementation.
</description>
</property>
<property>
<name>fs.tachyon.impl</name>
<value>tachyon.hadoop.TFS</value>
</property>
...
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>123</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>345</value>
</property>
...
</configuration>
Refer to any path using tachyon scheme and master host port:
tachyon://master_host:master_port/path
Example with default Tachyon master host-port:
tachyon://localhost:19998/remote_dir/remote_file.csv