1

I have a list that looks like this:

relationShipArray = []

relationShipArray.append([340859419124453377, 340853571828469762])
relationShipArray.append([340859419124453377, 340854579195432961])
relationShipArray.append([340770796777660416, 340824159120654336])
relationShipArray.append([340509588065513473, 340764841658703872])
relationShipArray.append([340478540048916480, 340671891540934656])
relationShipArray.append([340853571828469762, 340854579195432961])
relationShipArray.append([340842710057492480, 340825411573399553])
relationShipArray.append([340825411573399553, 340770796777660416])
relationShipArray.append([340825411573399553, 340824159120654336])
relationShipArray.append([340824159120654336, 340770796777660416])
relationShipArray.append([340804620295221249, 340825411573399553])
relationShipArray.append([340684236191313923, 340663388122279937])
relationShipArray.append([340663388122279937, 340684236191313923])
relationShipArray.append([340859507280318464, 340859419124453377])
relationShipArray.append([340859507280318464, 340853571828469762])
relationShipArray.append([340859507280318464, 340854579195432961])
relationShipArray.append([340854599697178624, 340845885439229952])
relationShipArray.append([340836561937641472, 340851694759972864])
relationShipArray.append([340854579195432961, 340853571828469762])
relationShipArray.append([340844519832580096, 340854599697178624])
relationShipArray.append([340814054610305024, 340748443670683648])
relationShipArray.append([340851694759972864, 340836561937641472])
relationShipArray.append([340748443670683648, 340814054610305024])
relationShipArray.append([340739498356912128, 340825992832638977])

As you can see there are cases that are duplicated. e.g.

[340853571828469762, 340854579195432961] 

is the same as (but inverted)

[340854579195432961, 340853571828469762]

What is the best way (with some efficiency but can live without it if need be) to remove the duplicates from this list? So in this case I would keep [340853571828469762, 340854579195432961], but remove the [340854579195432961, 340853571828469762].

vidit
  • 6,293
  • 3
  • 32
  • 50
user2091936
  • 546
  • 2
  • 7
  • 28
  • Does the order matter? (i.e. does it matter which you keep, and if you switch the order of some that don't have duplicates?) – James Jun 02 '13 at 16:34

3 Answers3

1

Use an OrderedDict if you need to keep the order:

from collections import OrderedDict

>>> L = [[1, 2], [4, 5], [1,2], [2, 1]]
>>> [[x, y] for x, y in OrderedDict.fromkeys(frozenset(x) for x in L)]
[[1, 2], [4, 5]]

EDIT 1

If the order is not important you can get away with a set:

>>> [[x, y] for x, y in set(frozenset(x) for x in L)]
[[1, 2], [4, 5]]

EDIT 2

A more generic solution that works for lists of varying lenght, not only with two elements:

[list(entry) for entry in set(frozenset(x) for x in L)]
[list(entry) for entry in OrderedDict.fromkeys(frozenset(x) for x in L)]
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
0

If the order of relationShipArray is not important:

result = {tuple(sorted(item)) for item in relationShipArray}
MostafaR
  • 3,547
  • 1
  • 17
  • 24
  • What is `item = tuple(item)` supposed to do? It does not change `relationShipArray` in any way. – Mike Müller Jun 02 '13 at 17:11
  • @MikeMüller items are `list`s and are not hashable, making `tuple`s to be able to make `set`. – MostafaR Jun 02 '13 at 17:46
  • I know this. But check what is in `relationShipArray` after the loop. Assigning to a loop variable does not change the list. BTW, you don't want a tuple anyway because the OP wants to treat also reversed items as duplicates. – Mike Müller Jun 02 '13 at 17:51
  • @MikeMüller Oh thank! I was on hurry! Ok I'd edited my answer, about the reversed items, `item.sort()` is there to handle them. – MostafaR Jun 02 '13 at 18:01
  • OK. The sort does the trick. How about a one liner?: `result = {tuple(sorted(item)) for item in relationShipArray}` – Mike Müller Jun 02 '13 at 20:22
  • @MikeMüller For sure it's a better idea. – MostafaR Jun 03 '13 at 03:50
0

One liner solution

relationShipArray = []

relationShipArray.append([340859419124453377, 340853571828469762])
relationShipArray.append([340859419124453377, 340854579195432961])
relationShipArray.append([340770796777660416, 340824159120654336])
relationShipArray.append([340509588065513473, 340764841658703872])
relationShipArray.append([340478540048916480, 340671891540934656])
relationShipArray.append([340853571828469762, 340854579195432961])
relationShipArray.append([340842710057492480, 340825411573399553])
relationShipArray.append([340825411573399553, 340770796777660416])
relationShipArray.append([340825411573399553, 340824159120654336])
relationShipArray.append([340824159120654336, 340770796777660416])
relationShipArray.append([340804620295221249, 340825411573399553])
relationShipArray.append([340684236191313923, 340663388122279937])
relationShipArray.append([340663388122279937, 340684236191313923])
relationShipArray.append([340859507280318464, 340859419124453377])
relationShipArray.append([340859507280318464, 340853571828469762])
relationShipArray.append([340859507280318464, 340854579195432961])
relationShipArray.append([340854599697178624, 340845885439229952])
relationShipArray.append([340836561937641472, 340851694759972864])
relationShipArray.append([340854579195432961, 340853571828469762])
relationShipArray.append([340844519832580096, 340854599697178624])
relationShipArray.append([340814054610305024, 340748443670683648])
relationShipArray.append([340851694759972864, 340836561937641472])
relationShipArray.append([340748443670683648, 340814054610305024])
relationShipArray.append([340739498356912128, 340825992832638977])

make an array with all lists in relationShipArray and their reversed peer. use then np.unique.

import numpy as np
Y = list(np.unique(np.array(relationShipArray + 
                       [X[::-1] for X in relationShipArray])))
kiriloff
  • 25,609
  • 37
  • 148
  • 229