My task is basically:
Read data from Google Cloud BigQuery using Spark/Scala.
Perform some operation (Like, Update) on the data.
Write back the data to BigQuery
Till now, I am able to read data from BigQuery using newAPIHadoopRDD()
which returns RDD[(LongWritable, JsonObject)]
.
tableData.map(entry => (entry._1.toString(),entry._2.toString()))
.take(10)
.foreach(println)
And below is the sample data,
(341,{"id":"4","name":"Shahu","score":"100"})
I am not able to figure out what functions should I use on this RDD to meet requirement.
Do I need to convert this RDD to DataFrame/Dataset/JSON format? and How?