Count Occurence in Json Array within objects

Question

I have the below json from which I am trying to count the occurrence of tags like Latin America in Python. As it appears twice in, it should return 2 for "Latin America" and 1 for "Mexico", "Health" and "Costa Rica".

{
"AlJazeera_data": [
 {
  "name": "Mexico City hospitals reaching breaking point",
  "url": "https://www.aljazeera.com/news/",
  "tags": [
     "Latin America",
     "Mexico",
     "Health"
      ],
   "author": "Manuel Rapalo"
},
{
   "name": "Football matches resume in Costa Rica as virus curbs ease",
   "url": "https://www.aljazeera.coml",
   "tags": [
      "Latin America",
      "Costa Rica"
      ],
    "author": "Manuel Rapalo"
}]
}

Using this code:

import json
from collections import Counter

with open('../../Resources/Aljazeera.json') as f:
   data = json.load(f)

for item in data['AlJazeera_data']:
    for t in item['tags']:
        print(t)

I get the output of the list of all tags, but I am stuck at calculating the count for all of the tags.

if you have the items, why don't you put them on a list or, better yet `collections.Counter` — JBernardo, May 23 '20 at 12:21

MindOfMetalAndWheels · Accepted Answer · 2020-05-23T12:57:18.453

0

You could do something like

import json
from collections import Counter

with open('../../Resources/Aljazeera.json') as f:
   data = json.load(f)

all_tags = Counter()

for item in data['AlJazeera_data']:
    all_tags.update(item['tags']):

print(all_tags)

Edit: As the other poster points out the second call to Counter was not needed

edited May 23 '20 at 12:57

answered May 23 '20 at 12:25

MindOfMetalAndWheels

339
1
12

thank you very much for you guidance this help me alot. – Ehtesham Abad May 23 '20 at 12:45
depends what you want to calculate a percentage of - if its a simple percentage of all of the tags you can find the total number by doing `sum(all_tags.values())` and then you can use that to find the percentages – MindOfMetalAndWheels May 23 '20 at 18:10

azro · Answer 2 · 2020-05-23T17:12:45.800

0

You need to .update() the counter with each list of tags

tags = Counter()
for item in data['AlJazeera_data']:
    tags.update(item['tags'])

print(tags) # Counter({'Latin America': 2, 'Mexico': 1, 'Health': 1, 'Costa Rica': 1})
print(tags.most_common(1)) # [('Latin America', 2)]

total = sum(tags.values())
print(total) # 5

tags_percentage = {k: v/total for k,v in tags.items()}
print(tags_percentage) # {'Latin America': 0.4, 'Mexico': 0.2, 'Health': 0.2, 'Costa Rica': 0.2}

edited May 23 '20 at 17:12

answered May 23 '20 at 12:27

azro

53,056
7
34
70

oh my bad... just a simple mistake by me... thank you very much for you guidance – Ehtesham Abad May 23 '20 at 12:45
@EhteshamAbad does 3 "?" makes the question more important than with one only ? – azro May 23 '20 at 16:56

Count Occurence in Json Array within objects

2 Answers2