Pyspark read json column has invalid characters

Asked Oct 13 '22 at 20:06

Active Oct 14 '22 at 03:10

Viewed 89 times

I'm reading a JSON in pyspark and seeing the below issue.

Column name "change(me)" contains invalid characters, please use alias to rename it

I have tried to use withColumnRenamed but that does not seem to help

df = spark.read.option("multiline","true").json("json_file")
df = df.withColumnRenamed("change(me)", "change_me")

Here is my sample json

{
"1": {
    "task": [
        "wakeup",
        "getready"
    ]
},
"2": {
    "task": [
        "brush",
        "shower"
    ]
},
"3": {
    "task": [
        "brush",
        "shower"
    ]
},
"activites": ["standup", "play", "sitdown"],
"statuscheck": {
    "time": 60,
    "color": 1002,
    "change(me)": 9898
},
"action": ["1", "2", "3", "4"]
}

when I check for columns in my dataframe, I do not see change(me) but it still complains of invalid character

edited Oct 14 '22 at 03:10

asked Oct 13 '22 at 20:06

RData

Pyspark read json column has invalid characters

0 Answers0