1

I have a list and I want to shuffle a portion of it in-place.

I am aware of random.shuffle() and that it works in-place, but if I slice the list, it shuffles the sliced copy of the original input, leaving the original list untouched:

import random

l = list(range(20))
print(l)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

random.shuffle(l[:10])  # I wish it was shuffling the first half
print(l)  # but does nothing to `l`
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

What options do I have?


EDIT: This is not truly a duplicate of this question, since there was no requirement for it to be in-place. Eventually, it seems that it would be possible to shuffle in-place a portion of a list only manually (which is exactly what I was trying to avoid), as suggested by one of the answers posted there.

x80486
  • 6,627
  • 5
  • 52
  • 111
norok2
  • 25,683
  • 4
  • 73
  • 99
  • 1
    See if this link helps - [Python - shuffle only some elements of a list](https://stackoverflow.com/questions/9557182/python-shuffle-only-some-elements-of-a-list) – tidakdiinginkan Apr 26 '20 at 09:55
  • 1
    After looking at [the Source](https://github.com/python/cpython/blob/3.8/Lib/random.py) I ended up with ... [this exact answer](https://stackoverflow.com/a/31416673): a patched `shuffle` where you have to provide start and end as arguments. – Jongware Apr 26 '20 at 10:08
  • @usr2564301 Thanks! While OP in the question was after something else, obviously that answer addresses (a bit disappointgly, though) my issues. – norok2 Apr 26 '20 at 10:51

3 Answers3

3

Not really in-place but with the desired result:

import random

l = list(range(20))

lpart = l[:10]
random.shuffle(lpart)

l[:10] = lpart

print(l)
Michael Butscher
  • 10,028
  • 4
  • 24
  • 25
2

Modifying the list n-place won't work on just a slice of the list. You could use random.sample instead to take random samples without replacement, and slice assign back:

k = 10
l[:k] = random.sample(l[:k], k=k)

print(l)
# [1, 7, 6, 0, 2, 3, 4, 9, 8, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
yatu
  • 86,083
  • 12
  • 84
  • 139
1

One could use Fisher-Yates shuffle to fundamentally re-implement random.shuffle() to accept a first and last index as arguments, e.g.:

import random


def valid_index(i, n):
    assert(-n <= i < n)
    return i % n


def shuffle(seq, first=0, last=-1, rand_int_gen=None):
    n = len(seq)
    first = valid_index(first, n)
    last = valid_index(last, n)
    # use Fisher-Yates shuffle (Durstenfeld method)
    if callable(rand_int_gen):
        for i in range(first, last):
            j = rand_int_gen(i, last)
            seq[i], seq[j] = seq[j], seq[i]
    else:
        getrandbits = random.getrandbits
        for i in range(first, last + 1):
            size = last - i + 1
            j = getrandbits(size.bit_length()) % size + i
            seq[i], seq[j] = seq[j], seq[i]
    return seq

to be used like:

l = list(range(20))
print(l)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

random.seed(0)  # just to show reproducible results
shuffle(l, 0, 9)
print(l)
# [6, 7, 2, 5, 8, 4, 9, 3, 0, 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Time-wise this is actually even a few percent faster than random.shuffle() for shuffling the whole sequence.

This is faster essentially because it gets the random values directly from random.getrandbits() which is the closest method exposed by random for random integer generation, the others, e.g. randint() and randrange() eventually reducing to this. These last two eventually use internally _getrandbelow() which could be calling getrandbits() more often the necessary.

for k in range(1, 7): 
    n = 10 ** k 
    print(n) 
    %timeit l = list(range(n)); random.shuffle(l) 
    %timeit l = list(range(n)); shuffle(l) 
    print() 
10
100000 loops, best of 3: 6.16 µs per loop
100000 loops, best of 3: 3.85 µs per loop

100
10000 loops, best of 3: 54.3 µs per loop
10000 loops, best of 3: 28 µs per loop

1000
1000 loops, best of 3: 585 µs per loop
1000 loops, best of 3: 341 µs per loop

10000
100 loops, best of 3: 6.01 ms per loop
100 loops, best of 3: 3.56 ms per loop

100000
10 loops, best of 3: 71.7 ms per loop
10 loops, best of 3: 44.1 ms per loop

1000000
1 loop, best of 3: 815 ms per loop
1 loop, best of 3: 582 ms per loop

This approach was also suggested here, as pointed out by @usr2564301. Unfortunately, I think there is no better approach for doing this operation in-place.

norok2
  • 25,683
  • 4
  • 73
  • 99