0

I wrote a code which will directly load the data from mysql to dataset format

 Dataset<Row> sourceDataView = spark.read().format("jdbc")
                        .option("url", url).option("driver", driver)
                        .option("dbtable", "t_data_master").load()
                        .where(jnCond).select(srcColSel)

I am able to get the resultant output of query in ResultSet form but not in Dataset form. How to acheive the resultant output of query in dataset type?

SHG
  • 2,516
  • 1
  • 14
  • 20
  • Can you clarify, are you querying mysql or VoltDB? If you are using VoltDB, can you provide more detail on the input parameters and any error messages you are seeing? Also, what is the query and how large would you expect the results to be? Typically you would use an export connector to push data from VoltDB to HDFS or Hive, and then you could load from there into Spark. – BenjaminBallard Aug 03 '17 at 15:44
  • Like mysql is there any connector for voltdb with spark – Satyanvesh Muppaneni Aug 08 '17 at 03:55
  • The best way to create a Spark RDD from data in VoltDB would be to use the HTTP (WebHDFS) export connector to output CSV files to HDFS. Then you can take advantage of the atomic properties of WebHDFS to move or rename the target directory to harvest an immutable set of CSV files. WebHDFS will create a new target directory and the data will continue to export from VoltDB for the next set. Using JDBC to great an RDD from one big query would be backwards for VoltDB, which is optimized for small transactions at high throughput. – BenjaminBallard Aug 08 '17 at 14:27

0 Answers0