Using of hadoop configuration at spark worker

Question

I want to read some data from Рadoop directly from spark worker:

So, at spark program I have a hadoop configuration: val configuration = session.sparkContext.hadoopConfiguration But I can't use it at worker because it isn't Serializable:

spark.sparkContext.parallelize(paths).mapPartitions(paths => {
      for (path <- paths) yield {
        //for example, read the parquet footer
        val footer = ParquetFileReader.readFooter(configuration, new Path(path), ParquetMetadataConverter.NO_FILTER)
        footer.getFileMetaData.getSchema.getName
      }
    })

results in

object not serializable (class: org.apache.hadoop.conf.Configuration...

score 0 · Answer 1 · answered Oct 09 '19 at 15:17

0

I don't know any solution to use Configuration object into a mapPartition. refer to this solution, you have to rewrite manually your conf into your mapPartition.

answered Oct 09 '19 at 15:17

Franck Cussac

310
1
3
14

Using of hadoop configuration at spark worker

1 Answers1