When I try to read a parquet file from a specified location like /test with spark.read.parquet() i get an error saying file://test does not exist. When I add the core-site.xml as a resource in code with
sc.hadoopConfiguration.addResource(new Path(<path-to-core-site.xml>))
it does look in the hdfs. However I don't want to add the resource in code. My question is how do I make sure spark reads the core-site.xml and uses hdfs as default file system.
I've setup an ubuntu 18.04.2LTS server with hadoop 3, spark 2.4.2 and yarn as resourcemanager in a virtual machine. I've configured the core-site.xml with fs.defaultFS set to hdfs://localhost:9000. I've also configured the HADOOP_CONF_DIR in the bash file.