I have an encrypted data in avro format which has the following schema
{"type":"record","name":"ProtectionWrapper","namespace":"com.security","fields":
[{"name":"protectionInfo","type":["null",{"type":"record","name":"ProtectionInfo","fields":
[{"name":"unprotected","type":"boolean"}]}]}],
"writerSchema":"{"type":"record","name":"Demo","namespace":"com.demo","fields":
[{"name":"id","type":"string"}]}"}
Here "writerSchema" is the schema of data before encryption. The data has to be written with the writer schema so that the decrypt function uses it while decrypting. When I use the below code, writer schema is written along with data.
Job mrJob = org.apache.hadoop.mapreduce.Job.getInstance(JavaSparkContext.hadoopConfiguration());
AvroJob.setDataModelClass(mrJob, SpecificData.class);
AvroJob.setOutputKeySchema(mrJob, protectionSchema) // schema shown above
JavaPairRDD<AvroKey<GenericRecord>, NullWritable> encryptedData = encryptionMethod();
encryptedData.saveAsNewAPIHadoopFile("c:\\test", AvroKey.class, NullWritable.class,
AvroKeyOutputFormat.class, mrJob.getConfiguration());
But if I try to convert the schema to struct Type and write using spark, the writer schema doesn't go with the data.
StructType type = (StructType)SchemaConverters.toSqlType(protectionSchema).dataType();
Dataset<Row> ds = SparkSession.createDataFrame(rdd, type);
ds.write();
Is it possible to achieve the same using spark write without having to use saveAsNewAPIHadoopFile() method.