4

I am trying to connect to a HiveTable using spark JDBC, with the following code:

val df = spark.read.format("jdbc").
  option("driver", "org.apache.hive.jdbc.HiveDriver").
  option("user","hive").
  option("password", "").
  option("url", jdbcUrl).
  option("dbTable", tableName).load()

df.show()

but the return I get is only an empty dataframe with modified columns name, like this:

--------------|---------------|
tableName.uuid|tableName.name |
--------------|---------------|

I've tried to read the dataframe in a lot of ways, but it always results the same. I'm using JDBC Hive Driver, and this HiveTable is located in an EMR cluster. The code also runs in the same cluster. Any help will be really appreciated. Thank you all.

PaulG
  • 13,871
  • 9
  • 56
  • 78
Rafzv
  • 41
  • 4
  • did you try using sqlContext for accessing hive tables? – yoga Mar 16 '17 at 21:51
  • 1
    Do you mean: sqlContext.read.jdbc(url, tableName, properties)? I've tried and got the same result. Do you think it could be somehow related to the HiveDriver? – Rafzv Mar 17 '17 at 02:19
  • 1
    Is your Hive table(`tableName`) in `default` database have data inside it? – mrsrinivas Mar 17 '17 at 05:49
  • I've found a workaround. Instead of acessing the hive table, i've accessed directly the hive metastore using the metastore URL. This can be done by setting the options in SparkContext and then reading the table using the SparkSession.read.table method. The option can be set through the hive.metastore.uris option. – Rafzv May 05 '17 at 23:31

1 Answers1

0

Please set fetchsize in option it should work.

Dataset<Row> referenceData
            = sparkSession.read()
            .option("fetchsize", "100")
            .format("jdbc")
            .option("url", jdbc.getJdbcURL())
            .option("user", "")
            .option("password", "")
            .option("dbtable", hiveTableName).load();
Chetan Shirke
  • 896
  • 4
  • 13
  • 35