1

code to write the file: df.coalesce(1).write.format("json").mode("append").save("/user/hive/warehouse/cpevoiceassistanteventhistory")

part of JSON data from source:

"event_header": {
        "accept_language": null,
        "app_id": "App_ID",
        "app_name": null,
        "client_ip_address": "IP",
        "event_id": "ID",
        "event_timestamp": null,
        "offering_id": "Offering",
        "server_ip_address": "IP",
        "server_timestamp": 1492565987565,
        "topic_name": "Topic",
        "version": "1.0"
    }

Output:

"event_header": {
        "app_id": "App_ID",
        "client_ip_address": "IP",
        "event_id": "ID",
        "offering_id": "Offering",
        "server_ip_address": "IP",
        "server_timestamp": 1492565987565,
        "topic_name": "Topic",
        "version": "1.0"
    }

In the above example keys accept_language, app_name and event_timestamp have been dropped.

This the code in scala, I want to do the same in Pyspark, not getting any perfect solution. Please click on this link

Keen_Learner
  • 87
  • 1
  • 8

0 Answers0