1

I have data in csv - 2 columns, 1st column contains member id and second contains characteristics in Key-Value pairs (nested one under another).

enter image description here

I have seen online codes which convert a simple Key-value pairs but not able to transform data like what i have shown above

I want to transform this data into a excel table as below enter image description here

Gupta
  • 314
  • 4
  • 17

1 Answers1

3

I did it with this XlsxWriter package, so first you have to install it by running pip install XlsxWriter command.

import csv  # to read csv file
import xlsxwriter  # to write xlxs file
import ast

# you can change this names according to your local ones
csv_file = 'data.csv'
xlsx_file = 'data.xlsx'

# read the csv file and get all the JSON values into data list
data = []
with open(csv_file, 'r') as csvFile:
    # read line by line in csv file
    reader = csv.reader(csvFile)

    # convert every line into list and select the JSON values
    for row in list(reader)[1:]:
        # csv are comma separated, so combine all the necessary
        # part of the json with comma
        json_to_str = ','.join(row[1:])

        # convert it to python dictionary
        str_to_dict = ast.literal_eval(json_to_str)

        # append those completed JSON into the data list
        data.append(str_to_dict)

# define the excel file
workbook = xlsxwriter.Workbook(xlsx_file)

# create a sheet for our work
worksheet = workbook.add_worksheet()

# cell format for merge fields with bold and align center
# letters and design border
merge_format = workbook.add_format({
    'bold': 1,
    'border': 1,
    'align': 'center',
    'valign': 'vcenter'})

# other cell format to design the border
cell_format = workbook.add_format({
    'border': 1,
})

# create the header section dynamically
first_col = 0
last_col = 0
for index, value in enumerate(data[0].items()):
    if isinstance(value[1], dict):
        # this if mean the JSON key has something else
        # other than the single value like dict or list
        last_col += len(value[1].keys())
        worksheet.merge_range(first_row=0,
                              first_col=first_col,
                              last_row=0,
                              last_col=last_col,
                              data=value[0],
                              cell_format=merge_format)
        for k, v in value[1].items():
            # this is for go in deep the value if exist
            worksheet.write(1, first_col, k, merge_format)
            first_col += 1
        first_col = last_col + 1
    else:
        # 'age' has only one value, so this else section
        # is for create normal headers like 'age'
        worksheet.write(1, first_col, value[0], merge_format)
        first_col += 1

# now we know how many columns exist in the
# excel, and set the width to 20
worksheet.set_column(first_col=0, last_col=last_col, width=20)

# filling values to excel file
for index, value in enumerate(data):
    last_col = 0
    for k, v in value.items():
        if isinstance(v, dict):
            # this is for handle values with dictionary
            for k1, v1 in v.items():
                if isinstance(v1, list):
                    # this will capture last 'type' list (['Grass', 'Hardball'])
                    # in the 'conditions'
                    worksheet.write(index + 2, last_col, ', '.join(v1), cell_format)
                else:
                    # just filling other values other than list
                    worksheet.write(index + 2, last_col, v1, cell_format)
                last_col += 1
        else:
            # this is handle single value other than dict or list
            worksheet.write(index + 2, last_col, v, cell_format)
            last_col += 1

# finally close to create the excel file
workbook.close()

I commented out most of the line to get better understand and reduce the complexity because you are very new to Python. If you didn't get any point let me know, I'll explain as much as I can. Additionally I used enumerate() python Built-in Function. Check this small example which I directly get it from original documentation. This enumerate() is useful when numbering items in the list.

Return an enumerate object. iterable must be a sequence, an iterator, or some other object which supports iteration. The __next__() method of the iterator returned by enumerate() returns a tuple containing a count (from start which defaults to 0) and the values obtained from iterating over iterable.

>>> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]

Here is my csv file,

enter image description here

and here is the final output of the excel file. I just merged the duplicate header values (matchruns and conditions).

enter image description here

Kushan Gunasekera
  • 7,268
  • 6
  • 44
  • 58
  • 1
    Thanks so much Kushan. I am trying to understand the code Will confirm by tomorrow – Gupta Aug 04 '19 at 19:01
  • You're welcome @RKapoor, I commented in the code to get you more understand about it. If you didn't get any point just print the output and check it or let me know where you stuck. I'm happy to help you! Ah, you can confirm by making this question as the accepted one, thank you too! – Kushan Gunasekera Aug 04 '19 at 19:09
  • 1
    To try and understand the loop, I ran the below command for row in list(reader)[1:]: print(row) I get the below eror Traceback (most recent call last): File "C:\Users\user.surname\AppData\Local\Programs\Python\Python37\lib\site-packages\IPython\core\interactiveshell.py", line 3291, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in for row in list(reader)[1:]: ValueError: I/O operation on closed file. – Gupta Aug 05 '19 at 17:50
  • Hello @RKapoor, I just run the code again. I got `['12002', "{'age':18,'matchruns':{'test':23,'day':42},'conditions':{'total':3,'types':['Grass','Hardball']}}"]` as the output. Did you replace the `data.csv` with your `csv` file name? Can I get you `csv` file or just open your `csv` with some text editor and update this question. I think something wrong with `csv` file. – Kushan Gunasekera Aug 06 '19 at 01:45
  • 1
    I checked the csv. It appears fine. However have provided the link of csv file for your reference. btw, the error is "ValueError: I/O operation on closed file". Does this gives any clue. I believe these are beginner questions. Thanks for helping out https://drive.google.com/file/d/1wXAOMQgda8n0QvfmwJiuwSejP8l9At20/view?usp=sharing – Gupta Aug 06 '19 at 08:35
  • @RKapoor I also got an error while running with your `24.csv` file. It's for `SyntaxError: unexpected EOF while parsing` , because you didn't close your `JSON` in the `csv` file. Please download mine (https://drive.google.com/open?id=1HkMk4kecG1Y3x2o05aoOJ6-5AD1R9Akx) and run with by replacing `csv_file ` variable to `'data.csv'`. Let me know if you are still get the same error again. – Kushan Gunasekera Aug 06 '19 at 13:33
  • @RKapoor did you check this [ValueError : I/O operation on closed file](https://stackoverflow.com/q/18952716/6194097) question? Please check it, I think you put `print(row)` in the wrong place. Just download `csv` file which I provide in the above comment, then copy this answer and run. Is `data.xlsx` file created? If yes, everything fine with my code base. Then move into next phase where you get errors! – Kushan Gunasekera Aug 06 '19 at 13:48
  • 1
    It works perfectly in the file you have sent. Still need to figure out the errors. Why print(row) is wrongly placed? Anyways, thanks so much for the answer and support – Gupta Aug 06 '19 at 15:06
  • @RKapoor okay, just add your print statement after the first for loop with the correct indentation with my CSV file. Just try it and let me know! – Kushan Gunasekera Aug 06 '19 at 15:12
  • 1
    Print statement works fine. Will report on next steps – Gupta Aug 07 '19 at 01:05
  • @RKapoor okay, waiting for your reply. Let me know if you're get any other error. Happy to help you! – Kushan Gunasekera Aug 07 '19 at 03:43
  • Thanks Kushan. This works fine. It has taken long time but not completely figure out. Understanding bit by bit. I have sent an invite on Link ed In. – Gupta Aug 11 '19 at 16:40
  • 1
    @KushanGunasekera would you please help me in this case? https://stackoverflow.com/questions/65299781/convert-json-to-excel-by-python – スーパーマン Dec 15 '20 at 06:40
  • @グエントォsure I'll check. – Kushan Gunasekera Dec 15 '20 at 07:53