In my case, I need to load impala data to spark(pyspark).
Because I want to use FPGrowth
of spark mllib.
Data is in kudu and it was made by impala. Connecting to directly kudu on spark was rejected by a relevant department. And I also failed connecting with impala jdbc made by cloudera.
So my last choice is
- Load data with ibis (https://github.com/ibis-project/ibis)
- Convert
ImpalaTable
to spark'sDataframe
But I couldn't find a way.
Do I think wrong?