Remove the duplicate id based on status without pandas?

Question

I need to write a output from aws account to excel sheet. I am using graphql and using jmespath.search to map the expressions and store in a excel sheet. I am facing issue with duplicate Id getting stored. I am using filter to merge two columns values into single columns values such as "active" or ""inactive". While storing the data I am getting duplicate Ids as well. I need to remove the duplicate id based on "inactive" status and add active Ids alone into a sheet.

For example let us take that response is coming in list format as below.

data = [
    {"id": 1, "deregistered ": True, "deactivated":True, "location": true},
    {"id": 1, "deregistered ": False, "deactivated": False, "location": true},
    {"id": 2, "deregistered ": False, "deactivated":False, "location": true},
]

Now, I need to write this into excel sheet by removing the duplicate id based on status. I need only below values from the data. i.e remove the id 1 if it is duplicate and store only active ids.

output={'id':1, 'status' : 'active', 'location': true},{'id':2, 'status' : 'inactive', 'location': true}

How to achieve this using python but without pandas. and I am using jmespath.search and mapping the values.

I tried as below but not getting the logic.

for val in data:
    loc_enabled = val.get("location")
    if loc_enabled:
        search = """
               {
               "id": id, 
               "status": ((deregistered == `true` || deactivated == `true`) && `Inactive`) ||                 `Active`,
                "location":location
                 }"""
    test = jmespath.search(search, val)

    if test:
        loc_enabled.append(test)

    print(loc_enabled)

I need to know where to handle the logic here without pandas. and get the desired results `

`from functools import reduce`. Then `sorted(reduce(lambda acc, it: ({**acc, **{it['id']: it}} if (it["id"] not in acc.keys()) or (acc[it["id"]]['status'] == 'inactive' and it['id'] == 'active') else acc), data, {}).values(), key=lambda x: x['id'])` — Wakeme UpNow, Feb 13 '23 at 14:18
What does _without pandas_ mean to you? Grouping in not something JMESPath is good at, so I am not totally sure this would be something you can achieve with it. — β.εηοιτ.βε, Feb 13 '23 at 14:24
if (it["id"] not in acc.keys()) or (acc[it["id"]]['status'] == 'inactive' and it['id'] == 'active') AttributeError: 'set' object has no attribute 'keys' @Wakeme UpNow — akdgp, Feb 13 '23 at 15:52
Change to `print(sorted(reduce(lambda acc, it: ({**acc, **{it['id']: it}} if (it["id"] not in acc.keys()) or (acc[it["id"]]['status'] == 'inactive' and it['id'] == 'active') else acc), data, dict()).values(), key=lambda x: x['id']))` then (Change `{}` to `dict()`) — Wakeme UpNow, Feb 13 '23 at 15:55
cool ! It worked. Thanks a lot ! print(sorted(reduce(lambda acc, it: ({**acc, **{it['id']: it}} if (it["id"] not in acc.keys()) or (acc[it["id"]]['status'] == 'inactive' and it['id'] == 'active') else acc), data, dict()).values(), key=lambda x: x['id'])) this worked, but not getting your point then (Change {} to dict ? — akdgp, Feb 13 '23 at 15:57
I use python 3.9, but in older version `{}` create a new `set` instead of `dict`. So I have to switch out to named object initialization — Wakeme UpNow, Feb 13 '23 at 16:16
@WakemeUpNow, Small clarification only, is it possible to add this code in my way using jmespath? can you please help me in adding your code to jmespath.search way. I have edited the data now pls check? — akdgp, Feb 13 '23 at 16:21
The simplest method is to breakdown into `2` steps. First, extract data using JMESPath to list of data. Then apply the above code to deduplicate entry — Wakeme UpNow, Feb 13 '23 at 16:29
But if you insists, try query twice (map entry to id). First query only `active` status, and then `inactive`. Then merge `active` (right) onto `inactive` (left) — Wakeme UpNow, Feb 13 '23 at 16:34
@WakemeUpNow, Is it possible to put your code as an answer so it will be helpful for me to learn the code. the way you said to query twice using lambda methods. — akdgp, Feb 13 '23 at 16:39
@WakemeUpNow, Please let me know could you help me on this part? — akdgp, Feb 13 '23 at 16:59
I can;t do it with jmespath, but I can do it with jq `jq.compile('((. | map({(.id | tostring): select(.status == "inactive")}) | add) + (. | map({(.id | tostring): select(.status == "active")}) | add)) | to_entries[] | .value').input(data).all()` — Wakeme UpNow, Feb 13 '23 at 17:53

Remove the duplicate id based on status without pandas?

0 Answers0