I am writing this code to get the integer value of count in specified table:
sc = SparkContext("local", "spar")
hive_context = HiveContext(sc)
hive_context.sql("use zs_trainings_trainings_db")
df = hive_context.sql("select count(*) from ldg_sales")
I am writing this code to get the integer value of count in specified table:
sc = SparkContext("local", "spar")
hive_context = HiveContext(sc)
hive_context.sql("use zs_trainings_trainings_db")
df = hive_context.sql("select count(*) from ldg_sales")
Either:
hive_context.table("sales").count
or
hive_context.sql("select count(*) from ldg_sales").first()[0]
convert dataframe to rdd so you can run map task on it to just extract row values like -
df = hive_context.sql("select count(*) as cnt from ldg_sales")
count = df.rdd.map(lambda _ : _.cnt).collect()[0]