0

Exploring PySpark Structured Streaming and databrick. I want to write a spark structural streaming job to read all the data from a kafka topic and publish to delta tables.

Let's assume I'm using latest version and kafka has following details.

kafka topic name: ABC kafka broker: localhost:9092 sample data: name=qwerty&company_name=stackoverflow&profession=learner

I want to store the kafka topic data in the delta table with the following fields:

timestamp, company_name, data 2022-11-14 07:50:00+0000, StackOverflow, name=qwerty&company_name=stackoverflow&profession=learner

Is there a way that I can see delta table data in console?

boring-coder
  • 63
  • 1
  • 5

1 Answers1

0

You can read and display your data using spark. Something like:

people_df = spark.read.load(table_path)

display(people_df)
# or
people_df.show(5)

Then you can submit this like any other spark job. Refer to doc for more details.

Rishabh Sharma
  • 747
  • 5
  • 9