0

I have some Apache access logs I want to parse using IPWhois.

I want to group the IPWhois results based on the asn_description field.

Isn't the fact that the set and the itertools.groupby() in the following snippet yeild different outcomes?

descs = set()

with open(RESULTSFILE, 'a+') as r:
    for description, items in groupby(results, key=lambda x: x['asn_description']):
        print('ASN Description: ' + description)
        descs.add(description)

print(descs)

e.g.

ASN Description: GOOGLE - Google LLC, US
ASN Description: AVAST-AS-DC, CZ
ASN Description: FACEBOOK - Facebook, Inc., US
ASN Description: AVAST-AS-DC, CZ
ASN Description: AMAZON-AES - Amazon.com, Inc., US
ASN Description: FACEBOOK - Facebook, Inc., US
ASN Description: AMAZON-02 - Amazon.com, Inc., US
ASN Description: AMAZON-02 - Amazon.com, Inc., US
ASN Description: GOOGLE - Google LLC, US
ASN Description: GOOGLE-2 - Google LLC, US
ASN Description: AMAZON-02 - Amazon.com, Inc., US
{'FACEBOOK - Facebook, Inc., US', 'AVAST-AS-DC, CZ', 'AMAZON-AES - Amazon.com, Inc., US', 'GOOGLE-2 - Google LLC, US', 'GOOGLE - Google LLC, US', 'AMAZON-02 - Amazon.com, Inc., US',
pkaramol
  • 16,451
  • 43
  • 149
  • 324

1 Answers1

0

Change your code to the following and try. If you donot need items perhaps drop it from the for loop by using a _ in its place.

import itertools
descs = dict()

with open(RESULTSFILE, 'a+') as r:
    for i, (description, items) in enumerate(itertools.groupby(results, key=lambda x: x['asn_description'])):
        print('ASN Description: ' + description)
        descs.update({i: description})

print(descs)

CypherX
  • 7,019
  • 3
  • 25
  • 37