0

My task is basically:

  1. Read data from Google Cloud BigQuery using Spark/Scala.

  2. Perform some operation (Like, Update) on the data.

  3. Write back the data to BigQuery

Till now, I am able to read data from BigQuery using newAPIHadoopRDD() which returns RDD[(LongWritable, JsonObject)].

tableData.map(entry => (entry._1.toString(),entry._2.toString()))
  .take(10)
  .foreach(println)

And below is the sample data,

(341,{"id":"4","name":"Shahu","score":"100"})

I am not able to figure out what functions should I use on this RDD to meet requirement.

Do I need to convert this RDD to DataFrame/Dataset/JSON format? and How?

Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85
Shawn
  • 537
  • 3
  • 7
  • 16

0 Answers0