0

I have below dataframe in Pandas:

Name Branch Class Details
Vicky CSE IV [ {“id” : “1234”,“entityType”:{ Name:”Parent” ,Type:”entity”},“name”:”Vikas”},{ “id” : “8974”, “entityType”:{Name:”Parent1”,Type:”entity1”},“name”:”Sachin”},{“id” : 5678”,“entityType”:{Name:”Parent2”,Type:”entity2” },“name”:”Sehwag”}]

Now, The 4th Column has nested JSON String, i am trying to fetch just the Name field from the nested JSON , as shown below:

Name Branch Class Details.0.Name Details.1.Name Details.2.Name
Vicky CSE IV Vikas Sachin Sehwag

Is there any way to do this using pandas ?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
RK12
  • 23
  • 3
  • Welcome to [Stack Overflow.](https://stackoverflow.com/ "Stack Overflow"). In order for us to help you, it is necessary that you show your effort and submit data to be used to reproduce your problem. While providing an image is helpful, it doesn't allow for reproducing the issue. Please edit your question to show a minimal reproducible set. See [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example "Minimal Reproducible Example") for details. – itprorh66 Dec 24 '21 at 14:45
  • Hi @itprorh66, thank you for pointing that out....i have edited my question, i hope this is better in terms of details – RK12 Dec 24 '21 at 15:54

1 Answers1

0

-------Updated Answer after Question Revised-----------

Knowing that you want to extract the name field for all of the objects in the Details column, you could try the below:

df["Detail_Names"] = df["Details"].apply(lambda x: [o["name"] for o in x])
df2 = pd.DataFrame(df["Detail_Names"].tolist())
df.merge(df2, left_index=True, right_index=True, how="outer").drop(["Details", "Detail_Names"], axis=1)

And if your "Details" column actually contains a string of JSON, then the first line would change to the following in order to convert the JSON string into a list first:

df["Details"] = df["Details"].apply(lambda x: [o["name"] for o in json.loads(x)])

(Note that I didn't bother rename the new columns.)

-----------Original Answer-----------------

If your "Details" column is a string data type that contains a JSON encoded string, then you might be able to use the following to extract into a new Series the "name" property of the 3rd object, using the built-in json library to decode the JSON string in each row to an object:

df["Details"].apply(lambda x: json.loads(x)[2]["name"])

On the other hand, if your column already contains a dict object from decoding of the json string, then you could do this:

df["Details"].apply(lambda x: x[2]["name"])

Here is a prior SO thread that might also provide some alternative ways to work with JSON in a DataFrame that use Pandas built-in methods, but I'm not sure if this works with a JSON array - it might only work with a JSON object in the column/Series. Looks like this is focused on splitting all the JSON elements into new columns, rather than extracting a specific one, but maybe there is a way to use this. If so, it might be more performant than my above suggestions using the apply function, particularly if you have a very large DataFrame.

BioData41
  • 862
  • 9
  • 15