I'm working on a NRT solution that requires me to frequently update the metadata on an Impala table.
Currently this invalidation is done after my spark code has run. I would like to speed things up by doing this refresh/invalidate directly from my Spark code.
What would be the most efficient approach?
- Oozie is just too slow (30 sec overhead? no thanks)
- An SSH action to an (edge) node seems like a valid solution but feels "hackish"
- I don't see a way to do this from the hive context in Spark either.