How do I create a bare minimum PySpark DataFrame in a Palantir Foundry Code Workbook?
To do this in a Code Repository I'd use:
my_df = ctx.spark_session.createDataFrame([('1',)], ["a"])
How do I create a bare minimum PySpark DataFrame in a Palantir Foundry Code Workbook?
To do this in a Code Repository I'd use:
my_df = ctx.spark_session.createDataFrame([('1',)], ["a"])
Code workbook injects a global spark
as the Spark session, rather than a transform context in ctx
. You can use it in a Python transform ('New Transform' > 'Python Code'):
def my_dataframe():
return spark.createDataFrame([('1',)], ["a"])
Or with a defined schema:
from pyspark.sql import types as T
from datetime import datetime
SCHEMA = T.StructType([
T.StructField('entity_name', T.StringType()),
T.StructField('thing_value', T.IntegerType()),
T.StructField('created_at', T.TimestampType()),
])
def my_dataframe():
return spark.createDataFrame([("Name", 3, datetime.now())], SCHEMA)