4

I'm trying to retrieve the list of columns from a Hive table and store the result in a spark dataframe.

var my_column_list = hiveContext.sql(s""" SHOW COLUMNS IN $my_hive_table""")

But I'm unable to alphabetically sort the dataframe or even the result of the show columns query. I tried using sort and orderBy().

How could I sort the result alphabetically?

Update: Added a sample of my code

import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.hive.HiveContext

val hiveContext = new HiveContext(sc)
hiveContext.sql("USE my_test_db")

var lv_column_list = hiveContext.sql(s""" SHOW COLUMNS IN MYTABLE""")
//WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems

lv_column_list.show //Works fine
lv_column_list.orderBy("result").show //Error arises
Amber
  • 914
  • 6
  • 20
  • 51

3 Answers3

3

The SHOW COLUMNS query produces a Dataframe with a column named result. If you order by this column, you get what you want :

val df = hiveContext.sql(s""" SHOW COLUMNS IN $my_hive_table """)
df.orderBy("result").show
cheseaux
  • 5,187
  • 31
  • 50
  • I got an exception "org.apache.spark.sql.AnalysisException: resolved attribute(s) result#31 missing from result#21 in operator !Sort [result#31 ASC], true;" – Amber Nov 08 '16 at 15:24
  • Added a code snippet in the question. When I execute the show columns part, I get a warning (shown in the code). Could this be a possible reason for the exception? – Amber Nov 09 '16 at 02:48
0

Not sure how you are using the sort or orderBy method,

Try the following,

df.sort(asc("column_name"))    
df.orderBy(asc("column_name"))
Kris
  • 1,618
  • 1
  • 13
  • 13
0

Instead of 'SHOW COLUMNS', I used 'DESC' and retrieved the column list with "col_name".

var lv_column_list = hiveContext.sql(s""" DESC MYTABLE""")
lv_column_list.select("col_name").orderBy("col_name")
Amber
  • 914
  • 6
  • 20
  • 51