What is Big-O complexity of random.choice(list) in Python3, where n is amount of elements in a list?
Edit: Thank You all for give me the answer, now I understand.
What is Big-O complexity of random.choice(list) in Python3, where n is amount of elements in a list?
Edit: Thank You all for give me the answer, now I understand.
O(1)
. Or to be more precise, it's equivalent to the big-O random access time for looking up a single index in whatever sequence you pass it, and list
has O(1)
random access indexing (as does tuple
). Simplified, all it does is seq[random.randrange(len(seq))]
, which is obviously equivalent to a single index lookup operation.
An example where it would be O(n)
is collections.deque
, where indexing in the middle of the deque
is O(n)
(with a largish constant divisor though, so it's not that expensive unless the deque
is reaching the thousands of elements range or higher). So basically, don't use a deque
if it's going to be large and you plan to select random elements from it repeatedly, stick to list
, tuple
, str
, byte
/bytearray
, array.array
and other sequence types with O(1)
indexing.
Though the question is about random.choice
and previous answers on it have several explanations, when I searched for the complexity of np.random.choice
, I didn't find an answer, so I decide to explain about np.random.choice
.
choice(a, size=None, replace=True, p=None). Assume a.shape=(n,)
and size=m
.
When with replacement:
The complexity for np.random.choice
is O(m) if p
is not specified (assuming it as uniform distribution), and is O(n + n log m ) if p
is specified.
The github code can be find here np.random.choice.
When p
is not specified, choice
generates an index array by randint
and returns a[index]
, so the complexity is O(m). (I assume the operation of generating a random integer by randint is O(1).)
When p
is specified, the function first computes prefix sum of p
. Then it draws m samples from [0, 1), followed by using binary search to find a corresponding interval in the prefix sum for each drawn sample. The evidence to use binary search can be found here. So this process is O(n + m log n). If you need a faster method in this situation, you can use Alias Method, which needs O(n) time for preprocessing and O(m) time to sample m items.
When without replacement: (It's kind of complicated, and maybe I'll finish it in the future.)
If p
is not specified, the complexity is the same as np.permutation(n)
, even when m is only 1. See more here.
If p
is specified, the expected complexity is at least $n \log n \log\frac{n}{n + 1 - m}$. (This is an upperbound, but not tight.)
The complexity of random.choice(list) is O(log n) where n is the number of elements in the list.
The cpython implementation uses _randbelow(len(seq)) to get a pseudo-random index and then returns the item at that index.
The bottleneck is the _randbelow() function which uses rejection sampling to generate a number in the range [0, n). The function generates k pseudo-random bits with a call to getrandbits(k) where k is ceil(log N). These bits represent a number in the range [0, 2**k). This process is repeated until the generated number is less than n. Each call to the pseudo-random number generator runs in O(k) where k is the number of bits generated which is O(log n).
I think the above answer is incorrect. I empirically verified that the complexity of this operation is O(n)
. Here is my code and a little plot. I am not sure about the theory though.
from time import time
import numpy as np
import matplotlib.pyplot as plt
N = np.logspace(2, 10, 40)
output = []
for i, n in enumerate(N):
print(i)
n = int(n)
stats = time()
A = np.random.choice(list(range(n)), n//2)
output.append(time()-stats)
plt.plot(N, output)