2

My need is to read other formats: JSON, binary, XML and infer the schema dynamically within a transform in Code Repositories and using Spark datasource api.

Example:

val df = spark.read.json(<hadoop_path>)

For that, I need an accessor to the Foundry file system path, which is something like:

foundry://...@url:port/datasets/ri.foundry.main.dataset.../views/ri.foundry.main.transaction.../startTransactionRid/ri.foundry.main.transaction...

This is possible with PySpark API (Python):

filesystem = input_transform.filesystem()
hadoop_path = filesystem.hadoop_path

However, for Java/Scala I didn’t find a way to do it properly.

Adil B
  • 14,635
  • 11
  • 60
  • 78
Mehdi
  • 140
  • 10
  • Have you tried https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L402 ? – fmsf Dec 06 '21 at 12:26
  • Yes, I managed to read the files as dataset of String, and then json method to infer the Schema and it works. For XML (with com.databricks.spark-xml library), it doesn't work as expected (maybe I need to add some options). However, my need is more general and is how to get the hadoop path with Palantir Foundry API in JAVA ? – Mehdi Dec 06 '21 at 13:25

1 Answers1

0

The getter to the Hadoop path has been recently added to Foundry Java API. By upgrading the version of the java transform (transformsJavaVersion >= 1.188.0), and you can get it:

val hadoopPath = myInput.asFiles().getFileSystem().hadoopPath()
Mehdi
  • 140
  • 10