The DataStax spark cassandra connector is great for interacting with Cassandra through Apache Spark. With Spark SQL 1.1, we can use the thrift server to interact with Spark with Tableau. Since Tableau can talk to Spark, and Spark can talk to Cassandra, there's surely some way to get Tableau talking to Cassandra through Spark (or rather Spark SQL). I can't figure out how to get this running. Ideally, I'd like to do this with Spark Standalone cluster + a cassandra cluster (i.e. without additional hadoop set up). Is this possible? Any pointers are appreciated.
Asked
Active
Viewed 1,295 times
3
-
Tableau just announced a driver for Spark SQL http://www.tableausoftware.com/about/blog/2014/10/tableau-spark-sql-big-data-just-got-even-more-supercharged-33799. The article describes how to request a beta copy. – Alex Blakemore Oct 17 '14 at 02:57
-
Any idea on getting spark + tableau to query cassandra? – ashic Feb 17 '15 at 23:12
-
Since Spark SQL can access Cassandra, it ought to be possible using the Tableau Spark SQL driver. Are you using the beta driver? If so what specific problem do you have? (or better yet, tell the beta program so they can fix it) – Alex Blakemore Feb 18 '15 at 04:37
-
The way spark sql and cassandra works is you do sc = new SparkContext(..); cc = new CassandraCqlContext(sc); cc.sql("Select * ...") . When I'm running the thriftserver, how would I tell the thriftserver to do this? – ashic Feb 18 '15 at 13:02
-
I don't have the answer, but if you are using the Tableau beta driver, they give you an email contact for feedback. They are working with Databricks on that driver, so that is a better place to direct your question. – Alex Blakemore Feb 18 '15 at 17:57
1 Answers
3
The HiveThriftServer has a HiveThriftServer2.startWithContext(sqlContext)
option so you could create your sqlContext referencing C* and the appropriate table / CF and then pass that context to the thrift server.
So something like this:
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.catalyst.types._
import java.sql.Date
val sparkContext = sc
import sparkContext._
val sqlContext = new HiveContext(sparkContext)
import sqlContext._
makeRDD((1,"hello") :: (2,"world") ::Nil).toSchemaRDD.cache().registerTempTable("t")
import org.apache.spark.sql.hive.thriftserver._
HiveThriftServer2.startWithContext(sqlContext)
So instead of starting the default thriftserver from Spark you could just lunch you cusotm one.

user4746156
- 46
- 2