this seems like a simple thing to do but nevertheless I am kind of stuck with this assignment.
I have a pyspark dataframe (created by reading a JSON file). It has almost 1000 column names, each column name referring to the unique identifier of the JSON file. The value of each column name represents the actual contents of the JSON file.
The dataframe now looks like this:
|json_file_1|json_file_2|json_file_3|json_file_4|
|:----------|:----------|:----------|:----------|
|json_content |json_content|json_content|json_content|
I want it convert into something like this, where each json_file name is transposed into a value of a to be created column 'id':
|id|json_content|
|:-|:-----------|
|json_file_1|json_content|
|json_file_2|json_content|
|json_file_3|json_content|
|json_file_4|json_content|
Any suggestions on how to do this most effectively? I have been studying some cases where "melt" is suggested but did not find shown examples where so different that I could not deploy them easily.
Note: I could simply copy and paste the table to excel and transpose it - but I don't want to go the easy route