0

(This is not real data)

I requested similar data from a Rest API. Then, I was able to convert some of the data to a .ndJSON format ( lines = True ); however, the address column is still shown in the ndjson format structure similar to a Python dictionary. My goal is to have the following columns: Column 1 | Street Address | City | State | Postal Code | Zip Code | Birthdate |

This is the first row:

& address & birthDate & deceasedBoolean & \ 0 & {[}{'city': 'MURFREESBORO', 'line': {[}'9999 Candy Cane Island'{]}, 'postalCode': '39999', 'state': '56'}{]} & 11/10/2081 & 0 & \

import pandas as pd
import json
from io import StringIO

data = response.text

newdf = pd.read_json(StringIO(data),lines = True)

newdf.tail(10)

newdf.to_csv('file.csv')

enter image description here

user17629522
  • 105
  • 8

1 Answers1

1
import pandas as pd

# dummy df
df = pd.DataFrame({'address': [{'city': 'MURFREESBORO', 'line': ['9999 Candy Cane Island'], 'postalCode': '39999', 'state': '56'}], 
                               'birthdate': ['11/20/1977']})
# remove the [] from our address colum
df['address'] = df['address'].apply(str).str.replace('[', '').str.replace(']', '')
# turn string dict to dict
df['address']= df['address'].map(eval)
# explode the dict into multiple cols
df2 = pd.DataFrame(df['address'].values.tolist(), index=df.index)
# join other col(s) 
df3 = df2.join(df['birthdate'])

output df

    city            line                   postalCode   state   birthdate
0   MURFREESBORO    9999 Candy Cane Island  39999        56    11/20/1977
Matthew Borish
  • 3,016
  • 2
  • 13
  • 25
  • I tried that and I got the following error: AttributeError: Can only use .str accessor with string values! – user17629522 Feb 17 '22 at 01:12
  • If you can copy and paste some of your actual data it will help a lot. It's hard to replicate your exact data structure from a screenshot. – Matthew Borish Feb 17 '22 at 01:15
  • address [{'city': 'MURFREESBORO', 'line': ['9999 Candy Cane Island'], 'postalCode': '39999', 'state': '56'}] – user17629522 Feb 17 '22 at 01:25
  • Thanks for posting this, but I am not able to recreate the .str accessor error on my machine. I recommend using `newdf.to_dict()` and pasting the printed output into your question as an edit to make sure we're working with the same data. Additionally, you might have success with `df['address'] = df['address'].apply(str).str.replace('[', '').str.replace(']', '')` instead. – Matthew Borish Feb 17 '22 at 06:37
  • Hi Matt, I think the reason why I am not getting that answer is because my data does not have a string around the [{ }]. It just looks like this [{'city':'NYC' and not this "[{'city':'NYC'. The first character is not ". – user17629522 Feb 17 '22 at 16:25
  • I edited my answer to match your quote format. I also changed this line `df['address'] = df['address'].apply(str).str.replace('[', '').str.replace(']', '')` to account for your Can only use .str accessor with string values error and the code above is working for me. – Matthew Borish Feb 17 '22 at 17:46