I am trying to stream a delta table from cluster A to Cluster B, but I am not able to load or write data to a different cluster:
streamingDf = spark.readStream.format("delta").option("ignoreChanges", "true") \
.load("hdfs://cluster_A/delta-table")
stream = streamingDf.writeStream.format("delta").option("checkpointLocation", "/tmp/checkpoint")\
.start("hdfs://cluster_B/delta-sink")
Then, I get the following error:
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block
So, my question is if it is posiible to stream data directly from two clusters using delta format, or additional technologies are requiered to achieve this.
Thanks!