0

I'm running Apache Spark 1.6.1 on a smaller yarn cluster. I'm attempting to pull data in from a hive table, using a query like:

df = hiveCtx.sql("""
SELECT *
  FROM hive_database.gigantic_table
 WHERE loaddate = '20170502'
""")

However, the resulting dataframe is the entire table, no matter what value I give for loaddate. The only odd thing I can think is that the hive table is partitioned by that loaddate column.

Hive alone appears to run this query correctly. I've tried casting to ints, using .filter(), and various levels of quotation marks, but no luck on Spark.

m_wynn
  • 1
  • 1
  • 2

1 Answers1

0

Turns out, filtering on a partition column is case-sensitive.

https://issues.apache.org/jira/browse/SPARK-19292

m_wynn
  • 1
  • 1
  • 2