4

I have been working on a big dataset with Spark. Last week when I ran the following lines of code it worked perfectly, now it is throwing an error: NameError: name 'split' is not defined. Can somebody explain why this is not working and what should I do? Name split is not defined... Should I define the method? Is it a dependency that I should import? The documentation doesn't say I ahve to import anything in order to use the split method. The code below.

test_df = spark_df.withColumn(
  "Keywords", 
   split(col("Keywords"), "\\|")
)
pault
  • 41,343
  • 15
  • 107
  • 149
Chique_Code
  • 1,422
  • 3
  • 23
  • 49

1 Answers1

6

You can use pyspark.sql.functions.split(), but you first need to import this function:

from pyspark.sql.functions import split

It's better to explicitly import just the functions you need. Do not do from pyspark.sql.functions import *.

pault
  • 41,343
  • 15
  • 107
  • 149
werner
  • 13,518
  • 6
  • 30
  • 45