0

"description": ID|100|\nName|Sam|\nCity|New York City|\nState|New York|\nContact|1234567890|\nEmail|1234@yahoo.com|

This is how my code in json looks like. I wanted to convert this json file to excel sheet to split the nested column to separate columns and have used pandas for it, but couldn't achieve it. The output I want in my excel sheet is:

ID Name City State Contact Email 100 Sam New York City New York 1234567890 1234@yahoo.com

I want to remove those pipes and the solution should be in pandas. Please help me out with this. The code I am trying:

I want output as:

The output on excel sheet: [2]: https://i.stack.imgur.com/QjSUU.png

The list of dict column looks like:

"assignees": [{ "id": 1234, "username": "xyz", "name": "XYZ", "state": "active", "avatar_url": "aaaaaaaaaaaaaaa", "web_url": "bbbbbbbbbbb" }, { "id": 5678, "username": "abcd", "name": "ABCD", "state": "active", "avatar_url": "hhhhhhhhhhh", "web_url": "mmmmmmmmm" } ],

1 Answers1

0

This could be one way:

import pandas as pd

df = pd.read_json('Sample.json')

df2 = pd.DataFrame()
for i in df.index:
    desc = df['description'][i]
    attributes = desc.split("\n")
    d = {}
    for attrib in attributes:
        if not(attrib.startswith('Name') or attrib.startswith('-----')):
            kv = attrib.split("|")
            d[kv[0]] = kv[1]

    df2 = df2.append(d, ignore_index=True)

print(df2)
df2.to_csv("output.csv")

Output xls: Output XLS using the Sample.json file

Pankaj Saini
  • 1,164
  • 1
  • 5
  • 4
  • Hi. I have actually inserted a pic of how it should look in excel for your reference. I am having a full json file which contains this description column, within which I have number of data in "ID|100|\nName|Sam|\nCity|New York City|\nState|New York|\nContact|1234567890|\nEmail|1234@yahoo.com|" format. So I want it to change the format automatically, so that the pandas code is able to split them into different columns. Link for the image: https://i.stack.imgur.com/fVpUL.png – Smriti Srivastava Jan 28 '22 at 07:55
  • Yep, so the solution I have posted does exactly that. Did you execute it on your full json? Please share image of the output from this code when executed against the full json. – Pankaj Saini Jan 28 '22 at 08:02
  • The code that I am trying out is: import pandas as pd df = pd.read_json('C:/Users/936007/Documents/Sample.json') desc = df.get('description') attributes = desc.split("\n") d = {} for attrib in attributes: (k, v, sp) = attrib.split("|") d[k] = v df = df.append(d, ignore_index=True) – Smriti Srivastava Jan 28 '22 at 08:14
  • AttributeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_13768/1113788833.py in 1 desc = df.get('description') ----> 2 attributes = desc.split("\n") 3 d = {} 4 5 for attrib in attributes: ~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5485 ): 5486 return self[name] -> 5487 return object.__getattribute__(self, name) 5488 5489 def __setattr__(self, name: str, value) -> None: AttributeError: 'Series' object has no attribute 'split' – Smriti Srivastava Jan 28 '22 at 08:15
  • Yes, because your code already makes the json into a dataframe. My code assumes its a json dictionary object. Is it possible for you to upload this file C:/Users/936007/Documents/Sample.json somewhere? If not all the records, just a few like 4-5 records. – Pankaj Saini Jan 28 '22 at 08:26
  • I have inserted an image in my question area. Could you check that out ? – Smriti Srivastava Jan 28 '22 at 08:29
  • The thing is, I have to pull the json code from an API call, and later convert it to excel sheet to pull it into Power BI. So this is where I have to deal with a huge amount of data, but this changed json format is ruining everything. – Smriti Srivastava Jan 28 '22 at 08:33
  • Where should I upload this file, I am not getting any option to attach files – Smriti Srivastava Jan 28 '22 at 08:51
  • I mean like a dropbox or google drive. On Stackoverflow there is no such option. Once the file is uploaded elsewhere, you can put the link here in comments or in your post. – Pankaj Saini Jan 28 '22 at 08:55
  • Here's the link: https://drive.google.com/file/d/1x3SvgXEYIKu7Er55sA6XsX6PJCW_HQqH/view?usp=sharing – Smriti Srivastava Jan 28 '22 at 10:27
  • I have edited the solution based on Sample.json from google drive. – Pankaj Saini Jan 28 '22 at 16:55
  • Hi Pankaj. This code is running fine for the Sample json. But when I am trying to do it for the larger code, I am getting errors. I wish I could just screen share and show it to you :( I am getting this error: ValueError: not enough values to unpack (expected 3, got 2) – Smriti Srivastava Jan 30 '22 at 07:28
  • Its due to this line `(k, v, sp) = attrib.split("|")`. Probably some of the values in your json are in a different format. You could upload the bigger json file to google if the information is not sensitive. – Pankaj Saini Jan 30 '22 at 08:20
  • That's the issue. The info is sensitive, so I am not uploading that file. – Smriti Srivastava Jan 30 '22 at 08:50
  • Here's the link of what my actual json code looks like: https://drive.google.com/file/d/16lhstbrrqouI--TVg1HOLSaLWvRY1taG/view?usp=sharing So I want to remove Name and values along with hyphen, | and \n. I have inserted the image in the question box of what output I want. – Smriti Srivastava Jan 31 '22 at 10:41
  • @SmritiSrivastava updated the code in answer as per the latest json file. – Pankaj Saini Jan 31 '22 at 16:27
  • Hi Pankaj. Thank you for your answer. But I am still getting this error: ValueError: not enough values to unpack (expected 3, got 2) – Smriti Srivastava Feb 02 '22 at 05:20
  • This just means that your "actual" json file has some records where the format is a little bit different. I have once again edited the code. Hope this works for you! *fingers crossed* – Pankaj Saini Feb 02 '22 at 05:25
  • Even in the excel sheet, I am adding an image in the question box of how it is looking – Smriti Srivastava Feb 02 '22 at 05:31
  • Thank you so much. This code worked for me. Thumbs up. I appreciate your time and help which you provided to me! – Smriti Srivastava Feb 02 '22 at 05:59
  • Great! I just added the XLS output as image - the one that I see on my side. I am using python3 – Pankaj Saini Feb 02 '22 at 06:00
  • I had one more question. I wanted to extract the json file from Gitlab API call url. This url also need Authorization token to run this API call. So how do should I go about it? – Smriti Srivastava Feb 02 '22 at 06:10
  • Do you want to call this Gitlab API within your python code? – Pankaj Saini Feb 02 '22 at 06:14
  • Yeah. I have accepted and upvoted for you. Yes I want it within the python code. I want to extract the json code directly from the url. The link won't run individually. It is through postman where I call the Gitlab API and pull the json file after adding my Access token. – Smriti Srivastava Feb 02 '22 at 07:34
  • Actually solution is documented here https://stackoverflow.com/questions/60243129/how-can-i-extract-contents-from-a-file-stored-in-gitlab-repos. You will need Gitlab URL, project name, token and filename. Let me know if it doesn't work for you or you need additional help. – Pankaj Saini Feb 02 '22 at 07:39
  • Ok I will check that. Thanks. I had one more question, like for the description, if it comes across such format of data, then it would successfully be able to split it, but if there is a data in the sequence of word format(sentence), then how can I put that in else statement? – Smriti Srivastava Feb 02 '22 at 10:21
  • Sorry I am struggling to understand your question. Perhaps you can provide an example of this description or sentence format. That will help me understand and write code for parsing it appropriately – Pankaj Saini Feb 02 '22 at 11:23
  • Sure. I shall attach the drive link: https://drive.google.com/file/d/16lhstbrrqouI--TVg1HOLSaLWvRY1taG/view?usp=sharing – Smriti Srivastava Feb 03 '22 at 03:52
  • Got it. You need another if statement after kv is obtained. Something like this: `if len(kv) > 0: then d[kv[0]] = kv[1] else: your logic here` – Pankaj Saini Feb 03 '22 at 04:21
  • 1
    Thank you so much Pankaj for your help! This solution seems working. – Smriti Srivastava Feb 03 '22 at 04:54
  • Hi Pankaj. Could you help me out with one more thing. I have a column which is list of dict. I want to split it into diff columns. But the thing is, I am reading the json from a url. And then trying to get assignees column from there and later explode it, but the column is not splitting. What should I do? Plz help me out. I have inserted the json code in ques box. – Smriti Srivastava Feb 07 '22 at 16:08
  • For this you should use apply not explode. Something like this `df['assignees'].apply(pd.Series)`. Pls write to me psaini at gmail if you need further help. – Pankaj Saini Feb 08 '22 at 01:11
  • Sure. I am mailing you, but this apply(pd.Series) didn't work for me. – Smriti Srivastava Feb 08 '22 at 05:28