6

Looks like spark sql is case sensitive for "like" queries, right ?

spark.sql("select distinct status, length(status)  from table")

Returns

Active|6

spark.sql("select distinct status  from table where status like '%active%'")

Returns no value

spark.sql("select distinct status  from table where status like '%Active%'")

Returns

 Active
djohon
  • 705
  • 2
  • 10
  • 25
  • Possible duplicate of https://stackoverflow.com/questions/34894372/spark-sql-case-insensitive-filter-for-column-conditions – stack0114106 Nov 28 '18 at 10:26

1 Answers1

11

Yes, Spark is case sensitive. Most of the RDBMSs are case sensitive by default for string comparison. If you want case-insensitive, try rlike or convert the column to upper/lower case.

scala> val df = Seq(("Active"),("Stable"),("Inactive")).toDF("status")
df: org.apache.spark.sql.DataFrame = [status: string]

scala> df.createOrReplaceTempView("tbl")

scala> df.show
+--------+
|  status|
+--------+
|  Active|
|  Stable|
|Inactive|
+--------+


scala> spark.sql(""" select status from tbl where status like '%Active%' """).show
+------+
|status|
+------+
|Active|
+------+


scala> spark.sql(""" select status from tbl where lower(status) like '%active%' """).show
+--------+
|  status|
+--------+
|  Active|
|Inactive|
+--------+


scala>
stack0114106
  • 8,534
  • 3
  • 13
  • 38