Apache spark WHERE clause not working

Question

I'm running Apache Spark 1.6.1 on a smaller yarn cluster. I'm attempting to pull data in from a hive table, using a query like:

df = hiveCtx.sql("""
SELECT *
  FROM hive_database.gigantic_table
 WHERE loaddate = '20170502'
""")

However, the resulting dataframe is the entire table, no matter what value I give for loaddate. The only odd thing I can think is that the hive table is partitioned by that loaddate column.

Hive alone appears to run this query correctly. I've tried casting to ints, using .filter(), and various levels of quotation marks, but no luck on Spark.

score 0 · Accepted Answer · answered Jul 13 '17 at 21:43

0

Turns out, filtering on a partition column is case-sensitive.

https://issues.apache.org/jira/browse/SPARK-19292

answered Jul 13 '17 at 21:43

m_wynn

1
1
2

Apache spark WHERE clause not working

1 Answers1