How to write to a Spark SQL table from a Panda data frame using PySpark?

Question

The pandas.DataFrame.to_sql() method will let you write out to a database the result of your data frame. This works fine in the context of a standard RDBMS. How to use this with Spark SQL though, using PySpark ? I need a connection parameter for this method - what can that be ?

thanks, Matt

fanfabbb · Accepted Answer · 2015-04-27T14:20:12.370

3

SparkSQL has nothing to do with the to_sql() which connects to a SQL engine. If sc is your SparkContext

import pandas as pd
df = pd.DataFrame({'Name':['Tom','Major','Pete'], 'Age':[23,45,30]})

from pyspark import SQLContext
sqlc = SQLContext(sc)

spark_df = sqlc.createDataFrame(df)

edited Apr 27 '15 at 14:20

answered Mar 26 '15 at 22:57

fanfabbb

2,734
1
14
12

How to write to a Spark SQL table from a Panda data frame using PySpark?

1 Answers1

Linked