7

I encountered a problem in spark 2.2 while using pyspark sql, I tried to split a column with period (.) and it did not behave well even after providing escape chars:

>>> spark.sql("select split('a.aaa','.')").show()
+---------------+
|split(a.aaa, .)|
+---------------+
|   [, , , , , ]|
+---------------+

>>> spark.sql("select split('a.aaa','\\.')").show()
+---------------+
|split(a.aaa, .)|
+---------------+
|   [, , , , , ]|
+---------------+

>>> spark.sql("select split('a.aaa','[.]')").show()
+-----------------+
|split(a.aaa, [.])|
+-----------------+
|         [a, aaa]|
+-----------------+

It uses period only when we provide it like [.] while it should also be working with escape seq '\.'. Am I doing something wrong here ?

some_user
  • 315
  • 2
  • 14
  • 2
    I think in the end you should just use `[.]` and not try to escape it with slashes. Most of the time you have to escape it multiple times and it is a mess. – swdev Apr 14 '20 at 05:28

2 Answers2

14

Looks like you need to escape the \\:

spark.sql("""select split('a.aa', '\\\\.')""").show()

If you were to run it directly in SparkSQL it would just be

select split('a.aa', '\\.')
Silvio
  • 3,947
  • 21
  • 22
1

As @swdev commented above you can just do this:

spark.sql("select SPLIT(value,'[.]') from demo").show