1

I have a SQL Server table that has a different schema than my dataframe. I would like to select some columns from my dataframe and "insert into" the table the values I selected.

Basically something similar to the code below but in pyspark:

INSERT INTO Cust_Diff_Schema_tbl
(acct_num, name)
SELECT account_no, name
FROM customers
WHERE customer_id > 5000;

I can read the data using jdbc using spark.read. Just like below:

df_s3 = spark.read.format("jdbc")\
                .option("driver", db_driver_name)\
                .option("url", db_url+ ":1433;databaseName="+stage_db)\
                .option("dbtable", tbl_name)\
                .option("query", """(select * from customers)""")\
                .option("user", db_username)\
                .option("password", db_password)\
                .load()
    
    df_s3.printSchema()
    df_s3.show(20)

To write/append the data to the table with the selected values, I believe I can still use "df_s3.write" but I need an example on how to use the insert statement using ".option" function or another approach if this does not work.

Thanks in advance.

AJR
  • 569
  • 3
  • 12
  • 30

1 Answers1

0
//create dataframe

val df = //fetch from  db,read file or other options

df.write.format("jdbc")
      .option("numPartitions", 20)
      .option("batchsize", 10000)
      .option("truncate", "true")
      .option("url", "jdbcURL")
      .option("driver", "Driver name")
      .option("dbtable", "tablename")
      .mode("append")
      .save()
vaquar khan
  • 10,864
  • 5
  • 72
  • 96
  • where is my insert into statement with the passed in values from my dataframe?? – AJR Oct 21 '20 at 17:04
  • first you need to create df and whatever column you want to insert keep only that in df ,now whan you save save and mode append will add at the end of table if you say moe override then truncate delete and re create table for you ,inshort no instrt into sql needed – vaquar khan Oct 21 '20 at 19:05
  • do you have an example you can share? – AJR Oct 21 '20 at 19:47
  • You do understand my table schema is different than my dataframe which is why I think I have to use insert into, to pick the columns. – AJR Oct 21 '20 at 21:08
  • Create new dataframe with existing df and keep.only column you want to insert ,will share example with you – vaquar khan Oct 21 '20 at 22:56