How to create Kudu table from pyspark dataframe

Question

Am trying a simple approach to write a datafram from pyspark and into a non-existing kudu table

df.write.format('org.apache.kudu.spark.kudu') \
        .option('kudu.master', kudu_master) \
        .option('kudu.table', kudu_table) \
        .mode("Append") \
        .save()

but I get the exception

py4j.protocol.Py4JJavaError: An error occurred while calling o92.save.
: org.apache.kudu.client.NonRecoverableException: the table does not exist: table_name: "kudu_table"

I expected the table to be created as it does in other database types, am I missing something or does Kudu tables need to pre-created ?

After some searching,

I was trying to directly call the underlying functions, I could create kuduContext but to create the table I have to wrap all the needed objects ex; schema, schema columns, etc ... and for some reason the internet doesn't have much info on that

kc = sc._jvm.org.apache.kudu.spark.kudu.KuduContext(kudu_master, sc._jsc.sc()) # working
print(kc.tableExists("test_table")) #working
kc.createTable("test_table", sc._jvm.org.apache.kudu.Schema(data.schema), sc._jvm.org.apache.kudu.client.CreateTableOptions().addHashPartitions(list("myKey"), 3)) #not working

How to create Kudu table from pyspark dataframe

After some searching,

0 Answers0