If I look at the Imapala Shell or Hue, I can write complicated enough IMPALA update statements for KUDU. E.g. update with sub-select and what not. Fine.
Looking at the old JDBC connection methods for, say, mySQL via SPARK / SCALA, there is not a lot of possibility to do a complicated update via such a connection, and that is understandable. However, with KUDU, I think the situation changes.
Looking at the documentation on KUDU - Apache KUDU - Developing Applications with Apache KUDU, the follwoing questions:
- It is unclear if I can issue a complex update SQL statement from a SPARK / SCALA environment via an IMPALA JDBC Driver (due to security issues with KUDU).
In SPARK KUDU Native Mode DML seems tied to a Data Frame approach with INSERT and UPSERT. What if I just want to write a free-format SQL DML statement like an UPDATE? I see that we can use Spark SQL to INSERT (treated as UPSERT by default) into a Kudu table. E.g.
sqlContext.sql(s"INSERT INTO TABLE $kuduTableName SELECT * FROM source_table")
- My understanding with SPARK SQL INSERT ... as per above is that the KUDU table must be a temporary table as well. I cannot approach it directly. So, taking this all in then, how can we approach a KUDU table directly in SPARK? We cannot in SPARK / KUDU, and complicated UPDATES statement via SPARK SCALA / KUDU or SPARK SCALA to KUDU via Impala JDBC connection do not allow this either. I can do some things via shell scripting with saved env vars in some cases I note.