Hello everyone !
I'm trying to send my structured streaming dataframe to one of my kafka topics, detection
.
This is the schema of the structued streaming dataframe:
root
|-- timestamp: timestamp (nullable = true)
|-- Sigma: string (nullable = true)
|-- time: string (nullable = true)
|-- duration: string (nullable = true)
|-- SourceComputer: string (nullable = true)
|-- SourcePort: string (nullable = true)
|-- DestinationComputer: string (nullable = true)
|-- DestinationPort: string (nullable = false)
|-- protocol: string (nullable = true)
|-- packetCount: string (nullable = true)
|-- byteCount: string (nullable = true)
but then i try to send the dataframe, with this method:
dfwriter=df \
.selectExpr("CAST(value AS STRING)") \
.writeStream \
.format("kafka") \
.option("checkpointLocation", "/Documents/checkpoint/logs") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("failOnDataLoss", "false") \
.option("topic", detection) \
.start()
Then i got the error :
pyspark.sql.utils.AnalysisException: cannot resolve 'value
' given input columns: [DestinationComputer, DestinationPort, Sigma, SourceComputer, SourcePort, byteCount, duration, packetCount, processName, protocol, time, timestamp]; line 1 pos 5;
If i send a dataframe with juste the column value
it works, i receive the data on my kafka topic consumer.
Any idea to send my dataframe with all columns ?
Thank you !