-1

I've tried multiple ways of de-duplicating the final output, but every time is the same. I don't understand why, my final solution is in the example:

#!/usr/bin/python3

import re
import sys
import json

## Variables
fileOfHosts = "/path/to/file"

## Define lists and dics
my_dic = {}
resultOut = []

## Functions:

# Filter duplicated lines in list
def f1(seq):
    newlist = []
    for i in seq:
        if i not in newlist:
            newlist.append(i)
    return newlist

# Json converter
def f2(inout):
    out = json.dumps(inout)
    return out

## Open file as list of lines
with open(fileOfHosts) as fileOfHosts:
    result = list(fileOfHosts)

## Parse lines and generate dictionary
for f in f1(result):
    sortByWord = re.findall(r"[\w+\.']+", f)
    listOfTwo = sortByWord[:2]
    if len(listOfTwo) == 2:
        my_dic[listOfTwo[0]] = listOfTwo[1]
        resultOut.append(my_dic.copy())

## Display list of dictoary as json
print(f2(resultOut))

I also tried by filtering at the end the dictionary list. But always I have the same duplicated lines.

Can someone give a better solution for filtering out the duplicates?

Edit:

First of all this is not a duplicate of the question mentioned in the comments. I tried the solution mentioned there before posting.

It actually seems the problem wasn't with the de-duplicating method, but instead the duplication is made when the dictionary is created.

Code:

for f in result:
    sortByWord = re.findall(r"[\w+\.']+", f)
    # print(sortByWord)
    listOfTwo = sortByWord[:2]
    # print(listOfTwo)
    if len(listOfTwo) == 2:
        print(listOfTwo)
        my_dic[listOfTwo[0]] = listOfTwo[1]
        resultOut.append(my_dic)

Output (print(listOfTwo)):

['define', 'host']
['host_name', 'HOST_name']
['alias', 'HOST_name']
['address', '127.0.0.1']
['register', '1']
['timezone', 'Europe']
['use', 'user']
['_SNMPCOMMUNITY', 'public']
['_SNMPVERSION', '3']
['_HOST_ID', '184']
['define', 'host']
['host_name', 'HOST_name']
['alias', 'HOST_name']
['address', '127.0.0.1']
['register', '1']
['timezone', 'Europe']
['use', 'user']
['_SNMPCOMMUNITY', 'public']
['_SNMPVERSION', '3']
['_HOST_ID', '185']

Output(print(f2(resultOut))):

[{"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}, {"_HOST_ID": "185", "address": "127.0.0.1", "_SNMPCOMMUNITY": "public", "timezone": "Europe", "define": "host", "host_name": "host", "_SNMPVERSION": "3", "alias": "host", "register": "1", "use": "user"}]

I don't understand why.

Sergiu
  • 21
  • 2
  • 8
  • Without reading all your code, the function `set` returns only unique elements, so `list(set(...))` maybe helps? – Sosel Oct 04 '17 at 10:44
  • 1
    Possible duplicate of [How do you remove duplicates from a list in whilst preserving order?](https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-in-whilst-preserving-order) – bgfvdu3w Oct 04 '17 at 10:46
  • Please add some sample lines to your question. – Martin Evans Oct 04 '17 at 10:47
  • @Sosel, but set may not preserve list order – Oleh Rybalchenko Oct 04 '17 at 10:47
  • 1
    **with open(fileOfHosts) as fileOfHosts: result = list(fileOfHosts)** don't use the same name for the string variable containing the folder name and for the openned file, use different names – nacho Oct 04 '17 at 10:50

1 Answers1

0

I resolved my issue by using the following code:

for f in result:
    sortByWord = re.findall(r"[\w+\.']+", f)
    listOfTwo = sortByWord[:2]
    if len(listOfTwo) == 2:
        list1.append(listOfTwo[0])
        list2.append(listOfTwo[1])
    if is_empty(listOfTwo) == True:
        my_dic = { k:v for (k,v) in zip(list1, list2)}
        if is_empty(my_dic) == False:
            resultOut.append(my_dic)
        list1 = []
        list2 = []
        my_dic = {}
Sergiu
  • 21
  • 2
  • 8