Getting Tableau to talk to Spark and Cassandra

Question

The DataStax spark cassandra connector is great for interacting with Cassandra through Apache Spark. With Spark SQL 1.1, we can use the thrift server to interact with Spark with Tableau. Since Tableau can talk to Spark, and Spark can talk to Cassandra, there's surely some way to get Tableau talking to Cassandra through Spark (or rather Spark SQL). I can't figure out how to get this running. Ideally, I'd like to do this with Spark Standalone cluster + a cassandra cluster (i.e. without additional hadoop set up). Is this possible? Any pointers are appreciated.

Tableau just announced a driver for Spark SQL http://www.tableausoftware.com/about/blog/2014/10/tableau-spark-sql-big-data-just-got-even-more-supercharged-33799. The article describes how to request a beta copy. — Alex Blakemore, Oct 17 '14 at 02:57
Since Spark SQL can access Cassandra, it ought to be possible using the Tableau Spark SQL driver. Are you using the beta driver? If so what specific problem do you have? (or better yet, tell the beta program so they can fix it) — Alex Blakemore, Feb 18 '15 at 04:37
The way spark sql and cassandra works is you do sc = new SparkContext(..); cc = new CassandraCqlContext(sc); cc.sql("Select * ...") . When I'm running the thriftserver, how would I tell the thriftserver to do this? — ashic, Feb 18 '15 at 13:02
I don't have the answer, but if you are using the Tableau beta driver, they give you an email contact for feedback. They are working with Databricks on that driver, so that is a better place to direct your question. — Alex Blakemore, Feb 18 '15 at 17:57

score 3 · Accepted Answer · answered Apr 03 '15 at 12:00

The HiveThriftServer has a HiveThriftServer2.startWithContext(sqlContext) option so you could create your sqlContext referencing C* and the appropriate table / CF and then pass that context to the thrift server.

So something like this:

import  org.apache.spark.sql.hive.HiveContext
import  org.apache.spark.sql.catalyst.types._
import  java.sql.Date
val  sparkContext  =  sc
import  sparkContext._
val  sqlContext  =  new  HiveContext(sparkContext)
import  sqlContext._
makeRDD((1,"hello") :: (2,"world") ::Nil).toSchemaRDD.cache().registerTempTable("t")
import  org.apache.spark.sql.hive.thriftserver._
HiveThriftServer2.startWithContext(sqlContext)

So instead of starting the default thriftserver from Spark you could just lunch you cusotm one.

Getting Tableau to talk to Spark and Cassandra

1 Answers1