0

this is my data structure in jsonl

"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}

i try to select countryCode from place column with this code

country_df = test_df.loc[test_df['place'].notnull(), ['content', 'place']]
countrycode_df = country_df["place"].loc["countryCode"]

but it gave me this error

KeyError: 'countryCode'

how do i fix this?

I had try this method but it doesnt fit my situation

2 Answers2

1

You can access it with str:

country_df['place'].str['countryCode']

Output:

0    US
Name: place, dtype: object
perl
  • 9,826
  • 1
  • 10
  • 22
  • i was able to get the same result but i cant use `.groupby("countryCode").size()` without this error " KeyError: 'countryCode' " anyway to solve this? – someone u don't know May 29 '21 at 14:25
  • 1
    Sure, you can do `df.groupby(df['place'].str['countryCode']).size()` (or just `df['place'].str['countryCode'].value_counts()` if you only want to know how many records with each `countryCode` you have) – perl May 29 '21 at 14:30
  • 1
    Or you can use `json_normalize` to convert `place` to a DataFrame and then you can work with `countryCode` as a column: `pd.json_normalize(df['place'])` – perl May 29 '21 at 14:32
  • sry if this out of topic how do i merge the result with my content column? – someone u don't know May 29 '21 at 14:35
  • 1
    Something like `pd.json_normalize(df.to_dict(orient='records'))` to normalize all columns – perl May 29 '21 at 14:37
  • I had try to normalize my data like this `countrycode_df = pd.json_normalize(data=country_df)` but i got this error instead "AttributeError: 'str' object has no attribute 'values' " – someone u don't know May 29 '21 at 14:45
  • 1
    @someoneudon'tknow You need `.to_dict(orient='records')` – perl May 29 '21 at 14:46
0

Since "place" is basically a dict (a nested dict), you can access it like the higher level dict

country = {"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}}
country["place"]["countryCode"]

output:

'US'

However, it might be better for your purpose to use pandas json_normalize():

country_df = pd.json_normalize(data = country)

print(country_df )

output:

content place._type place.fullName place.name place.type place.country place.countryCode
Not yall gassing up a gay boy with no rhythm snscrape.modules.twitter.Place Manhattan, NY Manhattan city United States US
Racooneer
  • 329
  • 1
  • 2
  • 11