How to remove delimeted pipe from my json column and split them to different columns and their respective values

Question

This is how my code in json looks like. I wanted to convert this json file to excel sheet to split the nested column to separate columns and have used pandas for it, but couldn't achieve it. The output I want in my excel sheet is:

ID Name City State Contact Email 100 Sam New York City New York 1234567890 1234@yahoo.com

I want to remove those pipes and the solution should be in pandas. Please help me out with this. The code I am trying:

I want output as:

The output on excel sheet: [2]: https://i.stack.imgur.com/QjSUU.png

The list of dict column looks like:

"assignees": [{ "id": 1234, "username": "xyz", "name": "XYZ", "state": "active", "avatar_url": "aaaaaaaaaaaaaaa", "web_url": "bbbbbbbbbbb" }, { "id": 5678, "username": "abcd", "name": "ABCD", "state": "active", "avatar_url": "hhhhhhhhhhh", "web_url": "mmmmmmmmm" } ],

Pankaj Saini · Accepted Answer · 2022-02-02T05:59:45.257

0

This could be one way:

import pandas as pd

df = pd.read_json('Sample.json')

df2 = pd.DataFrame()
for i in df.index:
    desc = df['description'][i]
    attributes = desc.split("\n")
    d = {}
    for attrib in attributes:
        if not(attrib.startswith('Name') or attrib.startswith('-----')):
            kv = attrib.split("|")
            d[kv[0]] = kv[1]

    df2 = df2.append(d, ignore_index=True)

print(df2)
df2.to_csv("output.csv")

Output xls:

edited Feb 02 '22 at 05:59

answered Jan 28 '22 at 06:47

Pankaj Saini

1,164
1
5
4

Hi. I have actually inserted a pic of how it should look in excel for your reference. I am having a full json file which contains this description column, within which I have number of data in "ID|100|\nName|Sam|\nCity|New York City|\nState|New York|\nContact|1234567890|\nEmail|1234@yahoo.com|" format. So I want it to change the format automatically, so that the pandas code is able to split them into different columns. Link for the image: https://i.stack.imgur.com/fVpUL.png – Smriti Srivastava Jan 28 '22 at 07:55
Yep, so the solution I have posted does exactly that. Did you execute it on your full json? Please share image of the output from this code when executed against the full json. – Pankaj Saini Jan 28 '22 at 08:02
The code that I am trying out is: import pandas as pd df = pd.read_json('C:/Users/936007/Documents/Sample.json') desc = df.get('description') attributes = desc.split("\n") d = {} for attrib in attributes: (k, v, sp) = attrib.split("|") d[k] = v df = df.append(d, ignore_index=True) – Smriti Srivastava Jan 28 '22 at 08:14
AttributeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_13768/1113788833.py in 1 desc = df.get('description') ----> 2 attributes = desc.split("\n") 3 d = {} 4 5 for attrib in attributes: ~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5485 ): 5486 return self[name] -> 5487 return object.__getattribute__(self, name) 5488 5489 def __setattr__(self, name: str, value) -> None: AttributeError: 'Series' object has no attribute 'split' – Smriti Srivastava Jan 28 '22 at 08:15
Yes, because your code already makes the json into a dataframe. My code assumes its a json dictionary object. Is it possible for you to upload this file C:/Users/936007/Documents/Sample.json somewhere? If not all the records, just a few like 4-5 records. – Pankaj Saini Jan 28 '22 at 08:26
I have inserted an image in my question area. Could you check that out ? – Smriti Srivastava Jan 28 '22 at 08:29
The thing is, I have to pull the json code from an API call, and later convert it to excel sheet to pull it into Power BI. So this is where I have to deal with a huge amount of data, but this changed json format is ruining everything. – Smriti Srivastava Jan 28 '22 at 08:33
Where should I upload this file, I am not getting any option to attach files – Smriti Srivastava Jan 28 '22 at 08:51
I mean like a dropbox or google drive. On Stackoverflow there is no such option. Once the file is uploaded elsewhere, you can put the link here in comments or in your post. – Pankaj Saini Jan 28 '22 at 08:55
Here's the link: https://drive.google.com/file/d/1x3SvgXEYIKu7Er55sA6XsX6PJCW_HQqH/view?usp=sharing – Smriti Srivastava Jan 28 '22 at 10:27
I have edited the solution based on Sample.json from google drive. – Pankaj Saini Jan 28 '22 at 16:55
Hi Pankaj. This code is running fine for the Sample json. But when I am trying to do it for the larger code, I am getting errors. I wish I could just screen share and show it to you :( I am getting this error: ValueError: not enough values to unpack (expected 3, got 2) – Smriti Srivastava Jan 30 '22 at 07:28
Its due to this line `(k, v, sp) = attrib.split("|")`. Probably some of the values in your json are in a different format. You could upload the bigger json file to google if the information is not sensitive. – Pankaj Saini Jan 30 '22 at 08:20
That's the issue. The info is sensitive, so I am not uploading that file. – Smriti Srivastava Jan 30 '22 at 08:50
Here's the link of what my actual json code looks like: https://drive.google.com/file/d/16lhstbrrqouI--TVg1HOLSaLWvRY1taG/view?usp=sharing So I want to remove Name and values along with hyphen, | and \n. I have inserted the image in the question box of what output I want. – Smriti Srivastava Jan 31 '22 at 10:41
@SmritiSrivastava updated the code in answer as per the latest json file. – Pankaj Saini Jan 31 '22 at 16:27
Hi Pankaj. Thank you for your answer. But I am still getting this error: ValueError: not enough values to unpack (expected 3, got 2) – Smriti Srivastava Feb 02 '22 at 05:20
This just means that your "actual" json file has some records where the format is a little bit different. I have once again edited the code. Hope this works for you! *fingers crossed* – Pankaj Saini Feb 02 '22 at 05:25
Even in the excel sheet, I am adding an image in the question box of how it is looking – Smriti Srivastava Feb 02 '22 at 05:31
Thank you so much. This code worked for me. Thumbs up. I appreciate your time and help which you provided to me! – Smriti Srivastava Feb 02 '22 at 05:59
Great! I just added the XLS output as image - the one that I see on my side. I am using python3 – Pankaj Saini Feb 02 '22 at 06:00
I had one more question. I wanted to extract the json file from Gitlab API call url. This url also need Authorization token to run this API call. So how do should I go about it? – Smriti Srivastava Feb 02 '22 at 06:10
Do you want to call this Gitlab API within your python code? – Pankaj Saini Feb 02 '22 at 06:14
Yeah. I have accepted and upvoted for you. Yes I want it within the python code. I want to extract the json code directly from the url. The link won't run individually. It is through postman where I call the Gitlab API and pull the json file after adding my Access token. – Smriti Srivastava Feb 02 '22 at 07:34
Actually solution is documented here https://stackoverflow.com/questions/60243129/how-can-i-extract-contents-from-a-file-stored-in-gitlab-repos. You will need Gitlab URL, project name, token and filename. Let me know if it doesn't work for you or you need additional help. – Pankaj Saini Feb 02 '22 at 07:39
Ok I will check that. Thanks. I had one more question, like for the description, if it comes across such format of data, then it would successfully be able to split it, but if there is a data in the sequence of word format(sentence), then how can I put that in else statement? – Smriti Srivastava Feb 02 '22 at 10:21
Sorry I am struggling to understand your question. Perhaps you can provide an example of this description or sentence format. That will help me understand and write code for parsing it appropriately – Pankaj Saini Feb 02 '22 at 11:23
Sure. I shall attach the drive link: https://drive.google.com/file/d/16lhstbrrqouI--TVg1HOLSaLWvRY1taG/view?usp=sharing – Smriti Srivastava Feb 03 '22 at 03:52
Got it. You need another if statement after kv is obtained. Something like this: `if len(kv) > 0: then d[kv[0]] = kv[1] else: your logic here` – Pankaj Saini Feb 03 '22 at 04:21
1

Thank you so much Pankaj for your help! This solution seems working. – Smriti Srivastava Feb 03 '22 at 04:54
Hi Pankaj. Could you help me out with one more thing. I have a column which is list of dict. I want to split it into diff columns. But the thing is, I am reading the json from a url. And then trying to get assignees column from there and later explode it, but the column is not splitting. What should I do? Plz help me out. I have inserted the json code in ques box. – Smriti Srivastava Feb 07 '22 at 16:08
For this you should use apply not explode. Something like this `df['assignees'].apply(pd.Series)`. Pls write to me psaini at gmail if you need further help. – Pankaj Saini Feb 08 '22 at 01:11
Sure. I am mailing you, but this apply(pd.Series) didn't work for me. – Smriti Srivastava Feb 08 '22 at 05:28

How to remove delimeted pipe from my json column and split them to different columns and their respective values

1 Answers1