TL;DR - this is an optimization that doesn't have much effect at small lru_cache sizes, but (see Raymond's reply) has a larger effect as your lru_cache size gets bigger.
So this piqued my interest and I decided to see if this was actually true.
First I went and read the source for the LRU cache. The implementation for cpython is here: https://github.com/python/cpython/blob/master/Lib/functools.py#L723 and I didn't see anything that jumped out to me as something that would operate better based on powers of two.
So, I wrote a short python program to make LRU caches of various sizes and then exercise those caches several times. Here's the code:
from functools import lru_cache
from collections import defaultdict
from statistics import mean
import time
def run_test(i):
# We create a new decorated perform_calc
@lru_cache(maxsize=i)
def perform_calc(input):
return input * 3.1415
# let's run the test 5 times (so that we exercise the caching)
for j in range(5):
# Calculate the value for a range larger than our largest cache
for k in range(2000):
perform_calc(k)
for t in range(10):
print (t)
values = defaultdict(list)
for i in range(1,1025):
start = time.perf_counter()
run_test(i)
t = time.perf_counter() - start
values[i].append(t)
for k,v in values.items():
print(f"{k}\t{mean(v)}")
I ran this on a macbook pro under light load with python 3.7.7.
Here's the results:
https://docs.google.com/spreadsheets/d/1LqZHbpEL_l704w-PjZvjJ7nzDI1lx8k39GRdm3YGS6c/preview?usp=sharing

The random spikes are probably due to GC pauses or system interrupts.
At this point I realized that my code always generated cache misses, and never cache hits. What happens if we run the same thing, but always hit the cache?
I replaced the inner loop with:
# let's run the test 5 times (so that we exercise the caching)
for j in range(5):
# Only ever create cache hits
for k in range(i):
perform_calc(k)
The data for this is in the same spreadsheet as above, second tab.
Let's see:

Hmm, but we don't really care about most of these numbers. Also, we're not doing the same amount of work for each test, so the timing doesn't seem useful.
What if we run it for just 2^n 2^n + 1, and 2^n - 1. Since this speeds things up, we'll average it out over 100 tests, instead of just 10.
We'll also generate a large random list to run on, since that way we'll expect to have some cache hits and cache misses.
from functools import lru_cache
from collections import defaultdict
from statistics import mean
import time
import random
rands = list(range(128)) + list(range(128)) + list(range(128)) + list(range(128)) + list(range(128)) + list(range(128)) + list(range(128)) + list(range(128))
random.shuffle(rands)
def run_test(i):
# We create a new decorated perform_calc
@lru_cache(maxsize=i)
def perform_calc(input):
return input * 3.1415
# let's run the test 5 times (so that we exercise the caching)
for j in range(5):
for k in rands:
perform_calc(k)
for t in range(100):
print (t)
values = defaultdict(list)
# Interesting numbers, and how many random elements to generate
for i in [15, 16, 17, 31, 32, 33, 63, 64, 65, 127, 128, 129, 255, 256, 257, 511, 512, 513, 1023, 1024, 1025]:
start = time.perf_counter()
run_test(i)
t = time.perf_counter() - start
values[i].append(t)
for k,v in values.items():
print(f"{k}\t{mean(v)}")
Data for this is in the third tab of the spreadsheet above.
Here's a graph of the average time per element / lru cache size:

Time, of course, decreases as our cache size gets larger since we don't spend as much time performing calculations. The interesting thing is that there does seem to be a dip from 15 to 16, 17 and 31 to 32, 33. Let's zoom in on the higher numbers:

Not only do we lose that pattern in the higher numbers, but we actually see that performance decreases for some of the powers of two (511 to 512, 513).
Edit: The note about power-of-two was added in 2012, but the algorithm for functools.lru_cache looks the same at that commit, so unfortunately that disproves my theory that the algorithm has changed and the docs are out of date.
Edit: Removed my hypotheses. The original author replied above - the problem with my code is that I was working with "small" caches - meaning that the O(n) resize on the dicts was not very expensive. It would be cool to experiment with very large lru_caches and lots of cache misses to see if we can get the effect to appear.