Trying to convert a csv to json, and group objects by ID, there are multiple items need to be grouped into a nested objects, for example a company can be under multiple categories or sub categories, Longitude and Latitude should be grouped into geolocation object etc. Here is sample of the csv file:
id,name,address,Latitude,Longitude,category,Sub_Category,description,Phone ,Website 50001,Fifth Avenue Company,"123 Address rd, city, state, country, postal",34.0522,118.2437,Consultants,Project Marketing," description text here, description text here description text here",111-111-1111,https://example.com 50001,Fifth Avenue Company,"123 Address rd, city, state, country, postal",34.0522,118.2437,Consultants,Project Manager,"description text here, description text here description text here",111-111-1111,https://example.com 50003,Metrics LTD,"123 Address rd, city, state, country, postal",47.6062,122.3321,Vendor,Adhesive,"description text here, description text here description text here",111-111-1111,https://example.com 50004,Metrics LTD,"123 Address rd, city, state, country, postal",47.6062,122.3321,Vendor,Sealant,"description text here, description text here description text here",111-111-1111,https://example.com 50005,International Company,"123 Address rd, city, state, country, postal",37.7749,122.4194,Manager,Construction Manager,"description text here, description text here description text here",111-111-1111,https://example.com
and here is what I am trying to accomplish :
[{
"id":50001,
"name":"Fifth Avenue Company",
"city":"Seattle",
"address":[{
"street":"123 Address rd",
"city": "cityName",
"state/province": "stateName",
"country": "countryName",
"Zip/postal": "ZipValue"
}],
"category": [
{
"cat_name": Consultant,
}
],
"Subcategory": [
{
"Subcat_name or id": Project Marketing,
"Subcat2_name or id": Project Manager,
}
],
"_geolocation": [
{
"Latitude": 34.0522,
"Longitude": 118.2437
}
],
"type":"multi-family new",
"description":"description text here",
// otherColumns ..88
]
What I've tried so far in a google colab notebook:
[0] from google.colab import files
uploaded = files.upload()
[3] import pandas as pd
import io
[4]df = pd.read_csv(io.StringIO(uploaded['companyfile.csv'].decode('utf-8')))
df
//simplified code
[5] j = (df.groupby(['id']).groups
.apply(lambda x:x[['Latitude','Longitude']].to_dict('records'))
.reset_index()
.rename(columns={ 0:'_geolocation'})
.to_json(orient='records'))
This code works but only return the ids, except the fact there are about 100 column of data for each id need to be displayed after grouping, so if I do the following:
j = (df.groupby(['id', 'name', 'address','Phone','Website',
'the rest of the columns'])
.apply(lambda
x:x[['Latitude','Longitude']].to_dict('records'))
.reset_index()
.rename(columns={ 0:'_geolocation'})
.to_json(orient='records'))
I get an error :
ValueError: cannot insert applicant, already exists
There are no duplicate column heads, however there are name driven columns for example applicant, applicant_id, applicant_fullName, applicant_phone
I tried other suggestions but Im stuck on this one hope anyone can help