0

I've a python list as this one [2, 5, 26, 37, 45, 12, 23, 37, 45, 12, 23, 37, 45, 12, 23, 37]. The real list is really long. The list repeat itself after a certain point in this case after 37. I have no problem finding the number at which it repeats, but i need to truncate the list at the second one. In this case the result would be [2, 5, 26, 37, 45, 12, 23, 37]. For finding the number (37 in this case) i use a function firstDuplicate() found on stackoverflow. Someone can help me ?

def firstDuplicate(a):
aset = set()
for i in a:
    if i in aset:
        return i
    else:
        aset.add(i)
        pass
    pass
pass
LIST = LIST[1:firstDuplicate(LIST)]
4rt1c0
  • 21
  • 3
  • Please update your question with how `firstDuplicate` is defined and how you use it. – blhsing May 15 '20 at 16:12
  • Is there never a case where a duplicate number would occur before the periodic cycle starts ? For example [ 2 ,5 ,2, 37, 45, 12, 23, 37, 45, 12, 23, 37 ]. If this is a possible pattern then merely finding the 1st duplicate will not cover it. – Alain T. May 15 '20 at 16:59

2 Answers2

0

You can use the same basic idea of firstDuplicate() and create a generator that yields values until the dupe is found. Then pass it to list(), a loop, etc.

l = [2, 5, 26, 37, 45, 12, 23, 37, 45, 12, 23, 37, 45, 12, 23, 37]

def partitionAtDupe(l):
    seen = set()
    for n in l:
        yield n
        if n in seen:    
            break
        seen.add(n)


list(partitionAtDupe(l))
# [2, 5, 26, 37, 45, 12, 23, 37]

It's not clear what should happen if there are no dupes. The code above will yield the whole list in that case.

Mark
  • 90,562
  • 7
  • 108
  • 148
0

A function to find the period size and length of repeated numbers should start from the end of the sequence of numbers. This will make it easier to ensure that there is a cycle up to the end of the list and avoid any concerns over non-periodic repetitions at the beginning of the list.

For example:

def getPeriod(seq):
    lastPos = { n:p for p,n in enumerate(seq) }                 
    prevPos = { n:p for p,n in enumerate(seq) if p<lastPos[n] }
    period  = 1
    for n in reversed(seq):
        if n not in prevPos: break
        delta = lastPos[n] - prevPos[n]
        if delta%period == 0 or period%delta == 0:
            period = max(delta,period)
        else: break
    nonPeriodic  = (i for i,(n,p) in enumerate(zip(seq[::-1],seq[-period-1::-1])) if n != p)
    periodLength = next(nonPeriodic,0)
    return period, periodLength

output:

seq     = [2, 5, 26, 37, 45, 12, 23, 37, 45, 12, 23, 37, 45, 12, 23, 37]

period, periodLength = getPeriod(seq)

print(period,periodLength) # 4 9
print(seq[:-periodLength]) # [2, 5, 26, 37, 45, 12, 23]
Alain T.
  • 40,517
  • 4
  • 31
  • 51