3

I have a Spark Dataframe (in Palantir Foundry) with the column "c_temperature". This column contains a JSON string in each row with the following schema:

{"TempCelsiusEndAvg":"24.33","TempCelsiusEndMax":"null","TempCelsiusEndMin":"null","TempCelsiusStartAvg":"22.54","TempCelsiusStartMax":"null","TempCelsiusStartMin":"null","TempEndPlausibility":"T_PLAUSIBLE","TempStartPlausibility":"T_PLAUSIBLE"}

I tried to extract the values (they are sometimes "null" and sometimes wiht values like e.g. "24.33") of the avg temperatures in the new columns "TempCelsiusEndAvg" and "TempCelsiusStartAvg" with the following code:

from pyspark.sql import functions as F
from pyspark.sql.types import StringType

def flat_json(sessions_finished):
    df = sessions_finished
    df = df.withColumn("new_temperature", F.col('c_temperature').cast(StringType())
    df = df.withColumn("TempCelsiusEndAvg", F.get_json_object("c_Temperature", '$.TempCelsiusEndAvg'))
    df = df.withColumn("TempCelsiusStartAvg", F.get_json_object("c_Temperature", '$.TempCelsiusStartAvg'))
    return df

I wanted to get the new columns filled with doubles like:

... +-----------------+-------------------+ ...
... |TempCelsiusEndAvg|TempCelsiusStartAvg| ...
... +-----------------+-------------------+ ...
... |            24.33|              22.54| ...
... +-----------------+-------------------+ ...
... |            29.28|              25.16| ...
... +-----------------+-------------------+ ...
... |             null|               null| ...
... +-----------------+-------------------+ ...

The new dataframe contains the columns but they are only filled with null values. Can anyone help me solving this problem?

... +-----------------+-------------------+ ...
... |TempCelsiusEndAvg|TempCelsiusStartAvg| ...
... +-----------------+-------------------+ ...
... |             null|               null| ...
... +-----------------+-------------------+ ...
... |             null|               null| ...
... +-----------------+-------------------+ ...
... |             null|               null| ...
... +-----------------+-------------------+ ...

There is also a comment in this thread: [https://stackoverflow.com/questions/46084158/how-can-you-parse-a-string-that-is-json-from-an-existing-temp-table-using-pyspar] that describes my problem, but I have no idea how to use this information.

steffsen
  • 45
  • 4
  • Are you sure your spelling is correct, the first column should be `TempCelsisusEndAvg` not `TempCelsiusEndAvg` and the second column does not exist in your sample data – ExplodingGayFish Nov 16 '22 at 10:09
  • Thanks for the hint, I edited the question. This was just a fault in this question, not in my code. – steffsen Nov 16 '22 at 12:21

1 Answers1

1

You are don't need to do anything, since the column is already a struct. You can create those columns by accessing them with a .

 df = df.withColumn("TempCelsiusEndAvg", F.col("c_temperature.TempCelsiusEndAvg"))
 df = df.withColumn("TempCelsiusStartAvg", F.col("c_temperature.TempCelsiusStartAvg"))
Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42
  • My spelling in the question was not correct. I edited the question, so that was not the reason for my problem. – steffsen Nov 16 '22 at 12:23
  • That didn't work for me. The columns "TempCelsiusEndAvg" and "TempCelsiusStartAvg" stay null but now I have the column "new_temperature" containing strings like "[24.33,,,22.54,,,,]". Now I could get the substrings but there must be a smarter way. – steffsen Nov 17 '22 at 07:37
  • Could you please show your initial data frame and provide testing data again? Because it seems like the column c_temperature is not of type String in JSON format. – Robert Kossendey Nov 17 '22 at 08:33
  • I'm not allowed to share any data because it is confidential. Palantir Foundry says the initial column is a struct. The schema is always like you see at the top. – steffsen Nov 17 '22 at 12:45
  • 1
    If it is a struct then you can already access it. I updated my answer. – Robert Kossendey Nov 17 '22 at 13:27
  • That worked for me. Seems like my colleague guided me the absolute wrong way and let me think it's a JSON format. Thanks for your help and patience! – steffsen Nov 17 '22 at 14:23
  • Don't worry, love to help :) – Robert Kossendey Nov 17 '22 at 16:13