How to maintain all the keys having null values of a dataframe while writing it into JSON in Pyspark

Asked Apr 10 '20 at 09:51

Active Apr 10 '20 at 23:25

Viewed 89 times

code to write the file: df.coalesce(1).write.format("json").mode("append").save("/user/hive/warehouse/cpevoiceassistanteventhistory")

part of JSON data from source:

"event_header": {
        "accept_language": null,
        "app_id": "App_ID",
        "app_name": null,
        "client_ip_address": "IP",
        "event_id": "ID",
        "event_timestamp": null,
        "offering_id": "Offering",
        "server_ip_address": "IP",
        "server_timestamp": 1492565987565,
        "topic_name": "Topic",
        "version": "1.0"
    }

Output:

"event_header": {
        "app_id": "App_ID",
        "client_ip_address": "IP",
        "event_id": "ID",
        "offering_id": "Offering",
        "server_ip_address": "IP",
        "server_timestamp": 1492565987565,
        "topic_name": "Topic",
        "version": "1.0"
    }

In the above example keys accept_language, app_name and event_timestamp have been dropped.

This the code in scala, I want to do the same in Pyspark, not getting any perfect solution. Please click on this link

edited Apr 10 '20 at 21:41

asked Apr 10 '20 at 09:51

Keen_Learner

1

No, thats the solution in scala. I need solution in pyspark @CPak – Keen_Learner Apr 10 '20 at 19:35
My fault for not reading more closely – CPak Apr 10 '20 at 19:37
No problem at all, please help if any idea on the same. – Keen_Learner Apr 10 '20 at 19:48

How to maintain all the keys having null values of a dataframe while writing it into JSON in Pyspark

0 Answers0