0

in Spark SQL, when doing a query against Databricks Delta tables, is there any way to make the string comparison case insensitive globally? i.e. when applying the WHERE clause for the columns I would like to avoid the "lcase" or "lower" function calls.

By default, these two WHERE clauses return different results depending on what the actual data is:

  • WHERE somefield = 'blah blah"
  • WHERE somefield = "Blah Blah"
Denny Lee
  • 3,154
  • 1
  • 20
  • 33
Kirk Quinbar
  • 1
  • 1
  • 1

1 Answers1

0

This can be best explained per the blog post Diving Into Delta Lake: Schema Enforcement & Evolution, specifically this section:

While Spark can be used in case sensitive or insensitive (default) mode, Delta Lake is case-preserving but insensitive when storing the schema. Parquet is case sensitive when storing and returning column information. To avoid potential mistakes, data corruption or loss issues (which we’ve personally experienced at Databricks), we decided to add this restriction.

The important call out is that avoid data corruption or loss issues, the string comparisons cannot be case insensitive. HTH!

Denny Lee
  • 3,154
  • 1
  • 20
  • 33