0

while reading from impala with urls like and

jdbc:hive2://impalajdbc.data:25004/;auth=noSasl

and spark sql

val rr = sparkSession.sql("SELECT item_id from someTable LIMIT 10")

it complains that

Cannot convert column 1 to long: java.lang.NumberFormatException: For input string: "item_id" [info] at org.apache.hive.jdbc.HiveBaseResultSet.getLong(HiveBaseResultSet.java:374) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$9(JdbcUtils.scala:435)

I know the culprit is that impala returns headers for the column together with the result.However,it's quite difficult to get rid of that with map or filter on Dataframe/rdd api,because using those operators requires to parse the result first

There are also other options:I can try to change hive config to disable returning headers which is a last resort.

doofin
  • 508
  • 2
  • 13

1 Answers1

0

Try using where clause in your select statement to exclude item_id header value.

Sample Query:

val rr = sparkSession.sql("SELECT item_id from someTable where item_id != 'item_id' LIMIT 10")
notNull
  • 30,258
  • 4
  • 35
  • 50