How to remove headers which cause NumberFormatException with spark sql and impala/hive

Question

while reading from impala with urls like and

jdbc:hive2://impalajdbc.data:25004/;auth=noSasl

and spark sql

val rr = sparkSession.sql("SELECT item_id from someTable LIMIT 10")

it complains that

Cannot convert column 1 to long: java.lang.NumberFormatException: For input string: "item_id" [info] at org.apache.hive.jdbc.HiveBaseResultSet.getLong(HiveBaseResultSet.java:374) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$9(JdbcUtils.scala:435)

I know the culprit is that impala returns headers for the column together with the result.However,it's quite difficult to get rid of that with map or filter on Dataframe/rdd api,because using those operators requires to parse the result first

There are also other options:I can try to change hive config to disable returning headers which is a last resort.

update : Solved with custom JdbcRdd – doofin Mar 20 '19 at 07:13 — doofin, Mar 20 '19 at 07:13

score 0 · Answer 1 · answered Feb 18 '19 at 20:28

0

Try using where clause in your select statement to exclude item_id header value.

Sample Query:

val rr = sparkSession.sql("SELECT item_id from someTable where item_id != 'item_id' LIMIT 10")

answered Feb 18 '19 at 20:28

notNull

30,258
4
35
50

How to remove headers which cause NumberFormatException with spark sql and impala/hive

1 Answers1