1

I have a pyspark dataframe where I want to add a new lit column, like this

my_dataframe.select(col("col1"), lit("this is data").alias("col2"))

By default, when I write this to BigQuery, the lit column type is string (good), but the mode is required (bad). How can I write a lit column and make BigQuery think it is nullable? My workaround is below - looking for a cleaner approach.

my_dataframe.select(col("col1"), when(lit(1) == 1, lit("this is data")).alias("col2"))
John
  • 1,167
  • 1
  • 16
  • 33
  • possible duplicate: https://stackoverflow.com/questions/46072411/can-i-change-the-nullability-of-a-column-in-my-spark-dataframe\ – YOLO Jan 13 '20 at 20:44

1 Answers1

0

You could create a new dataframe with a different schema:

my_dataframe = my_dataframe.select(col("col1"), when(lit(1) == 1, lit("this is data")).alias("col2"))

new_schema = [StructField('col1',StringType(),False), StructField('col2',StringType(),True)]

df2 = sqlContext.createDataFrame(my_dataframe.rdd, StructType(new_schema))

The StructFields follow the syntax: StructField('<COLUMN-NAME', <TYPE> , <NULLABLE? TRUE or FALSE>)

rmesteves
  • 3,870
  • 7
  • 23