PySpark structured streaming read Kafka to delta table

Question

Exploring PySpark Structured Streaming and databrick. I want to write a spark structural streaming job to read all the data from a kafka topic and publish to delta tables.

Let's assume I'm using latest version and kafka has following details.

kafka topic name: ABC kafka broker: localhost:9092 sample data: name=qwerty&company_name=stackoverflow&profession=learner

I want to store the kafka topic data in the delta table with the following fields:

timestamp, company_name, data 2022-11-14 07:50:00+0000, StackOverflow, name=qwerty&company_name=stackoverflow&profession=learner

Is there a way that I can see delta table data in console?

Is there a way that I can see delta table data in console - which console are you referring to? Also, you can view delta table data by writing simple pyspark code. — Rishabh Sharma, Jan 26 '23 at 15:47
When I run spark-submit on CLI/Mac Terminal instead of in databricks. — boring-coder, Jan 26 '23 at 19:12

score 0 · Answer 1 · answered Jan 27 '23 at 04:03

0

You can read and display your data using spark. Something like:

people_df = spark.read.load(table_path)

display(people_df)
# or
people_df.show(5)

Then you can submit this like any other spark job. Refer to doc for more details.

answered Jan 27 '23 at 04:03

Rishabh Sharma

747
5
9

PySpark structured streaming read Kafka to delta table

1 Answers1