Try to Select jsonl data column in another columns with .loc but got KeyError even though the key exists

Question

this is my data structure in jsonl

"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}

i try to select countryCode from place column with this code

country_df = test_df.loc[test_df['place'].notnull(), ['content', 'place']]
countrycode_df = country_df["place"].loc["countryCode"]

but it gave me this error

KeyError: 'countryCode'

how do i fix this?

I had try this method but it doesnt fit my situation

score 1 · Accepted Answer · answered May 29 '21 at 14:17

1

You can access it with str:

country_df['place'].str['countryCode']

Output:

0    US
Name: place, dtype: object

answered May 29 '21 at 14:17

perl

9,826
1
10
22

i was able to get the same result but i cant use `.groupby("countryCode").size()` without this error " KeyError: 'countryCode' " anyway to solve this? – someone u don't know May 29 '21 at 14:25
1

Sure, you can do `df.groupby(df['place'].str['countryCode']).size()` (or just `df['place'].str['countryCode'].value_counts()` if you only want to know how many records with each `countryCode` you have) – perl May 29 '21 at 14:30
1

Or you can use `json_normalize` to convert `place` to a DataFrame and then you can work with `countryCode` as a column: `pd.json_normalize(df['place'])` – perl May 29 '21 at 14:32
sry if this out of topic how do i merge the result with my content column? – someone u don't know May 29 '21 at 14:35
1

Something like `pd.json_normalize(df.to_dict(orient='records'))` to normalize all columns – perl May 29 '21 at 14:37
I had try to normalize my data like this `countrycode_df = pd.json_normalize(data=country_df)` but i got this error instead "AttributeError: 'str' object has no attribute 'values' " – someone u don't know May 29 '21 at 14:45
1

@someoneudon'tknow You need `.to_dict(orient='records')` – perl May 29 '21 at 14:46

Racooneer · Answer 2 · 2021-05-29T15:12:57.370

0

Since "place" is basically a dict (a nested dict), you can access it like the higher level dict

country = {"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}}
country["place"]["countryCode"]

output:

'US'

However, it might be better for your purpose to use pandas json_normalize():

country_df = pd.json_normalize(data = country)

print(country_df )

output:

content	place._type	place.fullName	place.name	place.type	place.country	place.countryCode
Not yall gassing up a gay boy with no rhythm	snscrape.modules.twitter.Place	Manhattan, NY	Manhattan	city	United States	US

edited May 29 '21 at 15:12

answered May 29 '21 at 14:33

Racooneer

329
1
2
11

is it the best approach to just normalize all my data before use `.groupby`? – someone u don't know May 29 '21 at 14:38
I would suggest so... `.groupby()` is a DataFrame function and with `json_normalize()` you convert the json to a DataFrame – Racooneer May 29 '21 at 14:42
I had try to normalize my data like this `countrycode_df = pd.json_normalize(data=country_df)` but i got this error instead "AttributeError: 'str' object has no attribute 'values' " – someone u don't know May 29 '21 at 14:45
had solve it need to add `df.to_dict(orient='records')` – someone u don't know May 29 '21 at 14:49
the data provided to the json_normalize needs to be a json. Sound like your data is already partially a DataFrame – Racooneer May 29 '21 at 14:58
Mayb the reason my data got processed once with this code `country_df = test_df.loc[test_df['place'].notnull(), ['content', 'place']]` which for some reason only cause it to partially dataframe – someone u don't know May 29 '21 at 15:08
indeed, I adjusted my answer to make the flow more clear – Racooneer May 29 '21 at 15:13

Try to Select jsonl data column in another columns with .loc but got KeyError even though the key exists

2 Answers2