I have a simple example running on a Dataproc master node where Tachyon, Spark, and Hadoop are installed.
I have a replication error writing to Tachyon from Spark. Is there any way to specify it needs no replication?
15/10/17 08:45:21 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/tachyon/workers/1445071000001/3/8 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3110)
The portion of the log I printed is just a warning, but a Spark error follows immediately.
I checked the Tachyon config docs, and found something that might be causing this:
tachyon.underfs.hdfs.impl "org.apache.hadoop.hdfs.DistributedFileSystem"
Given that this is all on a Dataproc master node, with Hadoop preinstalled and HDFS working with Spark, I would think that this is a problem solvable from within Tachyon.