while reading from impala with urls like and
jdbc:hive2://impalajdbc.data:25004/;auth=noSasl
and spark sql
val rr = sparkSession.sql("SELECT item_id from someTable LIMIT 10")
it complains that
Cannot convert column 1 to long: java.lang.NumberFormatException: For input string: "item_id" [info] at org.apache.hive.jdbc.HiveBaseResultSet.getLong(HiveBaseResultSet.java:374) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$9(JdbcUtils.scala:435)
I know the culprit is that impala returns headers for the column together with the result.However,it's quite difficult to get rid of that with map or filter on Dataframe/rdd api,because using those operators requires to parse the result first
There are also other options:I can try to change hive config to disable returning headers which is a last resort.