-2

Goal: I want to get a list of values which are unique. The list should remove duplicates and ideally should order them based on number

I have the following JSON file.

{ "words" : [{"id":598,"tags":[104,65],"langs":[1]},{"id":597,"tags":[104,65],"langs":[1]},{"id":596,"tags":[20,30],"langs":[1]}]}

Within my script I have the following code:

count = 0
tags = []
for index in data['words']: 
    count += 1
    tags.append(index["tags"])

This will print the following code:

print tags
# result: [[104, 65], [104, 65], [20, 30]]

What I want to achieve is to get an unique list of numbers. So that I end up with this situation:

# result [20,30,65,104]

In this case the double values are removed. Can someone help me to get into this direction?

Rotan075
  • 2,567
  • 5
  • 32
  • 54
  • So basically you're asking 2 questions (both asked here many times before): 1) [How to flatten a list of lists](https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists) to one list of all elements. 2) [How to remove duplicates from that list](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists) – Tomerikoo Oct 08 '20 at 17:58
  • 1
    Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). We expect you to research your problem before posting here. – Prune Oct 08 '20 at 18:00

1 Answers1

1

Simple way could be to use a set to collect the different tags, maybe like this:

tag_set = set()
for index in data['words']: 
    count += 1
    for t in index["tags"]:
        # if the tag 't' already exists in 'tag_set', it will NOT
        # be added again, that is the built-in behavior of a set.
        tag_set.add(t)

print(len(tag_set))
print(list(sorted(tag_set)))
Ralf
  • 16,086
  • 4
  • 44
  • 68
  • You can skip the second loop and just do `tag_set.update(index["tags"])`... Or a set-comp `tag_set = {t for index in data['words'] for t in index['tags']}` – Tomerikoo Oct 08 '20 at 18:03