SPARK KUDU Complex Update statements directly or via Impala JDBC Driver possible?

Question

If I look at the Imapala Shell or Hue, I can write complicated enough IMPALA update statements for KUDU. E.g. update with sub-select and what not. Fine.

Looking at the old JDBC connection methods for, say, mySQL via SPARK / SCALA, there is not a lot of possibility to do a complicated update via such a connection, and that is understandable. However, with KUDU, I think the situation changes.

Looking at the documentation on KUDU - Apache KUDU - Developing Applications with Apache KUDU, the follwoing questions:

It is unclear if I can issue a complex update SQL statement from a SPARK / SCALA environment via an IMPALA JDBC Driver (due to security issues with KUDU).
In SPARK KUDU Native Mode DML seems tied to a Data Frame approach with INSERT and UPSERT. What if I just want to write a free-format SQL DML statement like an UPDATE? I see that we can use Spark SQL to INSERT (treated as UPSERT by default) into a Kudu table. E.g.
```
sqlContext.sql(s"INSERT INTO TABLE $kuduTableName SELECT * FROM source_table")
```
My understanding with SPARK SQL INSERT ... as per above is that the KUDU table must be a temporary table as well. I cannot approach it directly. So, taking this all in then, how can we approach a KUDU table directly in SPARK? We cannot in SPARK / KUDU, and complicated UPDATES statement via SPARK SCALA / KUDU or SPARK SCALA to KUDU via Impala JDBC connection do not allow this either. I can do some things via shell scripting with saved env vars in some cases I note.

thebluephantom · Accepted Answer · 2018-07-10T23:46:28.850

What a bad documentation in this regard.

DML insert, update, ... possible via the "approach" below, some examples:

stmt.execute("update KUDU_1 set v = 'same value' where k in ('1', '4')  ;") 
stmt.execute("insert into KUDU_1 select concat(k, 'ABCDEF'), 'MASS INSERT' from KUDU_1  ;")

The only thing if using the corresponding stmt.executequery a Java resultset is returned which differs to the more standard approach of reading from JDBC sources and persisting the results. A little surprise here for me. Maybe 2 approaches needed, one for more regular selects and one work DML non-select. Not sure if that can be all in the same programme module. For another time. Yes it can.

One can of course use the jdbcDF.write approach to Impala with an IMPALA created KUDU table. Not my finest effort in hindsight — thebluephantom, Jul 11 '18 at 17:53

SPARK KUDU Complex Update statements directly or via Impala JDBC Driver possible?

1 Answers1