3

How do I create a bare minimum PySpark DataFrame in a Palantir Foundry Code Workbook?

To do this in a Code Repository I'd use:

my_df = ctx.spark_session.createDataFrame([('1',)], ["a"])
domdomegg
  • 1,498
  • 11
  • 20
Skat
  • 87
  • 7
  • To expand on your example on creating an empty dataset in Code Repositories: https://stackoverflow.com/questions/73406822/how-can-i-create-an-empty-dataset-from-on-a-pyspark-schema-in-palantir-foundry – domdomegg Aug 25 '22 at 18:58

1 Answers1

2

Code workbook injects a global spark as the Spark session, rather than a transform context in ctx. You can use it in a Python transform ('New Transform' > 'Python Code'):

def my_dataframe():
    return spark.createDataFrame([('1',)], ["a"])

Or with a defined schema:

from pyspark.sql import types as T
from datetime import datetime

SCHEMA = T.StructType([
    T.StructField('entity_name', T.StringType()),
    T.StructField('thing_value', T.IntegerType()),
    T.StructField('created_at', T.TimestampType()),
])

def my_dataframe():
    return spark.createDataFrame([("Name", 3, datetime.now())], SCHEMA)
domdomegg
  • 1,498
  • 11
  • 20