1

I want to read some data from Рadoop directly from spark worker:

So, at spark program I have a hadoop configuration: val configuration = session.sparkContext.hadoopConfiguration But I can't use it at worker because it isn't Serializable:

spark.sparkContext.parallelize(paths).mapPartitions(paths => {
      for (path <- paths) yield {
        //for example, read the parquet footer
        val footer = ParquetFileReader.readFooter(configuration, new Path(path), ParquetMetadataConverter.NO_FILTER)
        footer.getFileMetaData.getSchema.getName
      }
    })

results in

object not serializable (class: org.apache.hadoop.conf.Configuration...
Andrei Koch
  • 898
  • 1
  • 7
  • 23

1 Answers1

0

I don't know any solution to use Configuration object into a mapPartition. refer to this solution, you have to rewrite manually your conf into your mapPartition.

Franck Cussac
  • 310
  • 1
  • 3
  • 14