I'm trying to process some events with Spark Structured Streaming.
The incoming events looks like:
Event 1:
url |
---|
http://first/path/to/read/from... |
Event 2:
url |
---|
http://second/path/to/read/from... |
And so on.
My goal is to read each of these urls and generate a new DF. So far I've done it with a code like this where I did a collect()
.
def createDF(url):
file_url = "abfss://" + container + "@" + az_storage_account + ".dfs.core.windows.net/" + az_storage_folder + "/" + url
""" Read data """
binary = spark.read.format("binaryFile").load(file_url)
""" Do other operations """
...
""" save the data """
# write it into blob again
return something
def loadData(batchDf, batchId):
"""
batchDf:
+--------------------+---------+-----------+--------------+--------------------+---------+------------+--------------------+----------------+--------------------+
| body|partition| offset|sequenceNumber| enqueuedTime|publisher|partitionKey| properties|systemProperties| url|
+--------------------+---------+-----------+--------------+--------------------+---------+------------+--------------------+----------------+--------------------+
|[{"topic":"/subsc...| 0|30084343744| 55489|2021-03-03 14:21:...| null| null|[aeg-event-type -...| []|http://path...|
+--------------------+---------+-----------+--------------+--------------------+---------+------------+--------------------+----------------+--------------------+
"""
""" Before ....
df = batchDf.select("url")
url = df.collect()
[createDF(item) for item in url]
"""
# Now without collect()
# Select the url field of the df
url_select_df = batchDf.select("url")
# Read url value
result = url_select_df.rdd.map(lambda x: createDF(x.url))
query = df \
.writeStream \
.foreachBatch(loadData) \
.outputMode("update") \
.queryName("test") \
.start() \
.awaitTermination()
However, when I want to extract the URL without collect, I get the following error message:
It appears that you are attempting to reference SparkContext from a broadcast.
What could be happening?
Thank you very much for your help