1

I am using below code to write spark Streaming dataframe into MQSQL DB .Below is the kafka topic JSON data format and MYSQL table schema.Column name and types are same to same. But I am unable to see records written in MYSQL table. Table is empty with zero records.Please suggest.

Kafka Topic Data Fomat
ssingh@RENLTP2N073:/mnt/d/confluent-6.0.0/bin$ ./kafka-console-consumer --topic sarvtopic --from-beginning --bootstrap-server localhost:9092 {"id":1,"firstname":"James ","middlename":"","lastname":"Smith","dob_year":2018,"dob_month":1,"gender":"M","salary":3000} {"id":2,"firstname":"Michael ","middlename":"Rose","lastname":"","dob_year":2010,"dob_month":3,"gender":"M","salary":4000}

import pyspark

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("SSKafka") \
    .getOrCreate()
    dsraw = spark \
  .readStream \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "localhost:9092") \
  .option("subscribe", "sarvtopic") \
  .option("startingOffsets", "earliest") \
  .load()

ds = dsraw.selectExpr("CAST(value AS STRING)")
dsraw.printSchema()

from pyspark.sql.types import StructField, StructType, StringType,LongType
from pyspark.sql.functions import *

custom_schema = StructType([
    StructField("id", LongType(), True),
    StructField("firstname", StringType(), True),
    StructField("middlename", StringType(), True),
    StructField("lastname", StringType(), True),
    StructField("dob_year", StringType(), True),
    StructField("dob_month", LongType(), True),
    StructField("gender", StringType(), True),
    StructField("salary", LongType(), True),
])
      
Person_details_df2 = ds\
        .select(from_json(col("value"), custom_schema).alias("Person_details"))
        
Person_details_df3 = Person_details_df2.select("Person_details.*")


from pyspark.sql import DataFrameWriter

def foreach_batch_function(df, epoch_id):
    Person_details_df3.write.jdbc(url='jdbc:mysql://172.16.23.27:30038/securedb', driver='com.mysql.jdbc.Driver', dbtable="sparkkafka",  user='root',password='root$1234')
    pass

query = Person_details_df3.writeStream.trigger(processingTime='20 seconds').outputMode("append").foreachBatch(foreach_batch_function).start()

query
Out[14]: <pyspark.sql.streaming.StreamingQuery at 0x1fb25503b08>

MYSQL table Schema:

create table sparkkafka(
   id int,
   firstname VARCHAR(40) NOT NULL,
   middlename VARCHAR(40) NOT NULL,
   lastname VARCHAR(40) NOT NULL,
   dob_year int(40) NOT NULL,
   dob_month int(40) NOT NULL,
   gender VARCHAR(40) NOT NULL,
   salary int(40) NOT NULL,   
   PRIMARY KEY (id)
);
Sarvendra Singh
  • 109
  • 1
  • 1
  • 9
  • You need to use foreach / foreachBatch api to write streaming data into mysql). ForeachBatch available from above spark 2.4.0 version. Below that you need to use foreach writer. – Karthik Nov 24 '20 at 09:00
  • @Sri_Karthik Thanks sir for reply...It will be good if you can share any example of foreach / foreachBatch api implementation(best if in pyspark python) – Sarvendra Singh Nov 24 '20 at 09:09
  • hi see below example code – Karthik Nov 24 '20 at 10:33
  • @Sri_Karthik Sir Sir Person_details_df3 is my streaming dataframe and my spark version is above 2.4.1 version.When running below code getting query as below .So while running above code should automatically written reocords in MYSQL. Please correct my if have to anything else query Out[14]: But Not able to see records in mysql table.While column name in topic(JSON form ) and sql table have same name and types. – Sarvendra Singh Nov 24 '20 at 16:55
  • @Sri_Karthik Sir I have edited my original question with all relevant details..Please help as I am new to Spark and Kafka. – Sarvendra Singh Nov 24 '20 at 17:12
  • while running spark application whether you are seeing any error. Please share the error. – Karthik Nov 25 '20 at 09:50
  • yes Sir @Sri_Karthik there were errors related to like driver,user,password arguments should not be there in df.write.jdbc .Finally got able to write dataframe in my sql with below code changes db_target_properties = {"user":"xxxx", "password":"yyyyy"} def foreach_batch_function(df, epoch_id): df.write.jdbc(url="jdbcurl", table="sparkkafka_sarv", properties=db_target_properties) pass query = Person_details_df3.writeStream.outputMode("append").foreachBatch(foreach_batch_function).start() query.awaitTermination() – Sarvendra Singh Nov 25 '20 at 19:33
  • @Sri_Karthik Many many thanks for the support. :)..Is there any link or example of writing pyspark streaming dataframe in relational database.I did not found any example in google even in spark official documentation.Please share any link if you have . – Sarvendra Singh Nov 25 '20 at 19:34
  • 1
    Glad, It helped you. No I don't have any links. Actually, I have done in SCALA to inject into MySQL. For your understanding I tried to convert code into python and written for you. I will update the answer as per the corrections provided by you. It will help some others in future, who tries in python. Please accept the answer as solved your problem if you think. – Karthik Nov 27 '20 at 06:36
  • @Sri_Karthik Accepted Sir..Its my pleasure – Sarvendra Singh Dec 01 '20 at 05:52

1 Answers1

1

I presume Person_details_df3 is your streaming dataframe and your spark version is above 2.4.0 version.

To use foreachBatch API write as below:

db_target_properties = {"user":"xxxx", "password":"yyyyy"}

def foreach_batch_function(df, epoch_id):
    df.write.jdbc(url='jdbc:mysql://172.16.23.27:30038/securedb',  table="sparkkafka",  properties=db_target_properties)
    pass

query = Person_details_df3.writeStream.outputMode("append").foreachBatch(foreach_batch_function).start()

query.awaitTermination()
Karthik
  • 1,143
  • 7
  • 12
  • Sir Person_details_df3 is my streaming dataframe and my spark version is above 2.4.1 version.When running below code getting query as below .So while running above code should automatically written reocords in MYSQL. Please correct my if have to anything else query Out[14]: But Not able to see records in mysql table.While column name in topic(JSON form ) and sql table have same name and types. – Sarvendra Singh Nov 24 '20 at 16:51