0

The pandas.DataFrame.to_sql() method will let you write out to a database the result of your data frame. This works fine in the context of a standard RDBMS. How to use this with Spark SQL though, using PySpark ? I need a connection parameter for this method - what can that be ?

thanks, Matt

matthieu lieber
  • 662
  • 1
  • 17
  • 30

1 Answers1

3

SparkSQL has nothing to do with the to_sql() which connects to a SQL engine. If sc is your SparkContext

import pandas as pd
df = pd.DataFrame({'Name':['Tom','Major','Pete'], 'Age':[23,45,30]})

from pyspark import SQLContext
sqlc = SQLContext(sc)

spark_df = sqlc.createDataFrame(df)
fanfabbb
  • 2,734
  • 1
  • 14
  • 12