Spark JDBC returning dataframe only with column names

Question

I am trying to connect to a HiveTable using spark JDBC, with the following code:

val df = spark.read.format("jdbc").
  option("driver", "org.apache.hive.jdbc.HiveDriver").
  option("user","hive").
  option("password", "").
  option("url", jdbcUrl).
  option("dbTable", tableName).load()

df.show()

but the return I get is only an empty dataframe with modified columns name, like this:

--------------|---------------|
tableName.uuid|tableName.name |
--------------|---------------|

I've tried to read the dataframe in a lot of ways, but it always results the same. I'm using JDBC Hive Driver, and this HiveTable is located in an EMR cluster. The code also runs in the same cluster. Any help will be really appreciated. Thank you all.

Do you mean: sqlContext.read.jdbc(url, tableName, properties)? I've tried and got the same result. Do you think it could be somehow related to the HiveDriver? — Rafzv, Mar 17 '17 at 02:19
Is your Hive table(`tableName`) in `default` database have data inside it? — mrsrinivas, Mar 17 '17 at 05:49
I've found a workaround. Instead of acessing the hive table, i've accessed directly the hive metastore using the metastore URL. This can be done by setting the options in SparkContext and then reading the table using the SparkSession.read.table method. The option can be set through the hive.metastore.uris option. — Rafzv, May 05 '17 at 23:31

score 0 · Answer 1 · answered Mar 23 '18 at 12:01

Please set fetchsize in option it should work.

Dataset<Row> referenceData
            = sparkSession.read()
            .option("fetchsize", "100")
            .format("jdbc")
            .option("url", jdbc.getJdbcURL())
            .option("user", "")
            .option("password", "")
            .option("dbtable", hiveTableName).load();

Spark JDBC returning dataframe only with column names

1 Answers1

Linked