I am newbie on Big Insights. I am working on BigInsigths on cloud 4.1, Ambari 2.2.0 and Spark 1.6.1 It doesn't matter if the connection is in scala or python, but I need to do data processing with spark and then persist it in BigSql. Is this possible? Thanks in advance.
2 Answers
Check syshadoop.execspark to see how to execute Spark Jobs and return the output in table format, after which you can insert to a table or join with other tables.
SELECT *
FROM TABLE(SYSHADOOP.EXECSPARK(
class => 'DataSource',
format => 'json',
uriload => 'hdfs://host.port.com:8020/user/bigsql/demo.json'
)
) AS doc
WHERE doc.country IS NOT NULL
LIMIT 5

- 3
- 1
-
Thanks for your hepful, but I am on Big Insight 4.1 on cloud :( SYSHADOOP.EXECSPARK is available on 4.2 and 4.3 version. Do you know other way for try to do somethig like this? Thanks! – JohanaAnez Jun 13 '17 at 15:15
Here are the steps to connect BigSQL through PySpark using jdbc in BigInsights --
1.Place db2jcc4.jar (IBM driver to connect to BigSQL, you can download it from http://www-01.ibm.com/support/docview.wss?uid=swg21363866) in the python library.
2.Add the jar file path in the spark-defaults.conf file (located in the conf folder of your spark installation) spark.driver.extraClassPath /usr/lib/spark/python/lib/db2jcc4.jar spark.executor.extraClassPath /usr/lib/spark/python/lib/db2jcc4.jar
or
Start up Spark Shell with the jar path -- pyspark --jars /usr/lib/spark/python/lib/db2jcc4.jar
3.Use the sqlContext.read.format to specify the JDBC URL and other driver information --
from pyspark.sql import SQLContext
sqlContext=SQLContext(sc)
df = sqlContext.read.format("jdbc").option(url="jdbc:db2://hostname:port/bigsql",driver="com.ibm.db2.jcc.DB2Driver",dbtable="tablename", user="username", password="password").load()
df.show()

- 3
- 3