I am taking a mooc.
It has a shakespeareDF dataframe that has below text
word |
+-------------------------------------------------+
|1609 |
| |
|the sonnets |
| |
|by william shakespeare |
| |
| |
| |
|1 |
|from fairest creatures we desire increase |
|that thereby beautys rose might never die |
|but as the riper should by time decease |
|his tender heir might bear his memory |
|but thou contracted to thine own bright eyes |
|feedst thy lights flame with selfsubstantial fuel|
+-------------------------------------------------+
On it, they run below code
from pyspark.sql.functions import split, explode
shakeWordsDF = (shakespeareDF.select(explode(split(shakespeareDF[0],"\s+"))
I would like to understand:
- what is difference between explode and split and why do we have to use both? I tried to look into the online documentation and couldnt understand
- why do we have to use shakespeareDF[0] and not just shakespeareDF