0

I am newbie on Big Insights. I am working on BigInsigths on cloud 4.1, Ambari 2.2.0 and Spark 1.6.1 It doesn't matter if the connection is in scala or python, but I need to do data processing with spark and then persist it in BigSql. Is this possible? Thanks in advance.

Community
  • 1
  • 1
JohanaAnez
  • 31
  • 5

2 Answers2

0

Check syshadoop.execspark to see how to execute Spark Jobs and return the output in table format, after which you can insert to a table or join with other tables.

https://www.ibm.com/support/knowledgecenter/en/SSPT3X_4.3.0/com.ibm.swg.im.infosphere.biginsights.db2biga.doc/doc/biga_execspark.html

SELECT *
  FROM TABLE(SYSHADOOP.EXECSPARK(
    class    => 'DataSource',
    format   => 'json',
    uriload  => 'hdfs://host.port.com:8020/user/bigsql/demo.json'
    )
  ) AS doc
  WHERE doc.country IS NOT NULL
  LIMIT 5
iender
  • 3
  • 1
  • Thanks for your hepful, but I am on Big Insight 4.1 on cloud :( SYSHADOOP.EXECSPARK is available on 4.2 and 4.3 version. Do you know other way for try to do somethig like this? Thanks! – JohanaAnez Jun 13 '17 at 15:15
0

Here are the steps to connect BigSQL through PySpark using jdbc in BigInsights --

1.Place db2jcc4.jar (IBM driver to connect to BigSQL, you can download it from http://www-01.ibm.com/support/docview.wss?uid=swg21363866) in the python library.

2.Add the jar file path in the spark-defaults.conf file (located in the conf folder of your spark installation) spark.driver.extraClassPath /usr/lib/spark/python/lib/db2jcc4.jar spark.executor.extraClassPath /usr/lib/spark/python/lib/db2jcc4.jar

or

Start up Spark Shell with the jar path -- pyspark --jars /usr/lib/spark/python/lib/db2jcc4.jar

3.Use the sqlContext.read.format to specify the JDBC URL and other driver information --

from pyspark.sql import SQLContext

sqlContext=SQLContext(sc)

df = sqlContext.read.format("jdbc").option(url="jdbc:db2://hostname:port/bigsql",driver="com.ibm.db2.jcc.DB2Driver",dbtable="tablename", user="username", password="password").load()

df.show()