Why Is Shuffling List(range(n)) Slower Than Shuffling [0]*n?

May 30, 2024 Post a Comment

Using random.shuffle, I noticed that shuffling list(range(n)) takes about 25% more time than shuffling [0] * n. Here are times for sizes n from 1 million to 2 million: Why is shuf

Solution 1:

(note: I didn't have time to finish this answer, so here's a start -- this definitely doesn't fit in a comment, hopefully it can help someone else finish this out!)

this appears to be due to locality of reference (and perhaps a cpython implementation detail -- I don't see the same results in pypy for example)

a few data points before an attempt at explanation:

random.shuffle is implemented in pure python and works for any mutable sequence type -- it is not specialized for lists.

this means that every swap involves __getitem__, increasing the refcount of the item, __setitem__, decreasing the refcount of the item

list.reverse is implemented in C and only works for list (using implementation details of a list)

this means that every swap happens without invoking __getitem__ or changing reference counts. the internal items of the list are rearranged directly

the important bit being the reference counts

in cpython, the reference count is stored with the object itself, and nearly all objects are stored in the heap. in order to adjust the reference count (even temporarily) a write to the ob_refcnt will page in the PyObject structure into cache/memory/etc.

(here's where I ran out of time -- I would probably do some memory fault analysis to confirm this hypothesis)

Solution 2:

The difference is that list.reverse, as a list function, has access to the underlying pointers array. So it can indeed rearrange the pointers without looking at the objects in any way (source):

reverse_slice(PyObject **lo, PyObject **hi)
{
    assert(lo && hi);

    --hi;while (lo < hi) {
        PyObject *t = *lo;
        *lo = *hi;
        *hi = t;
        ++lo;
        --hi;
    }
}

The random.shuffle and numpy.random.shuffle functions on the other hand only have an outsider view and go through the list's interface, which involves briefly loading the objects to swap them:

random.shuffle:

defshuffle(self, x, random=None):
        ...
            for i inreversed(range(1, len(x))):
                # pick an element in x[:i+1] with which to exchange x[i]
                j = randbelow(i+1)
                x[i], x[j] = x[j], x[i]

numpy.random.shuffle:

defshuffle(self, object x, axis=0):
          ...
                for i inreversed(range(1, n)):
                    j = random_interval(&self._bitgen, i)
                    x[i], x[j] = x[j], x[i]

So there's at least potential for a lot of cache misses. But let's as a test try reverse in Python:

defmy_reverse(x):
        lo = 0
        hi = len(x) - 1while lo < hi:
            x[lo], x[hi] = x[hi], x[lo]
            lo += 1
            hi -= 1

Benchmarking that:

Reversing list(range(n)) was just as fast as reversing [0] * n, despite loading the objects. The reason is that Python creates the objects pretty much sequentially in memory. Here's a test with a million objects. Almost all were located 16 bytes after the previous one:

>>>mylist = list(range(10**6))>>>from collections import Counter>>>ctr = Counter(id(b) - id(a) for a, b inzip(mylist, mylist[1:]))>>>for distance, how_often in ctr.most_common():
        print(distance, how_often)

16 996056
48 3933
-1584548240 1
-3024 1
2416 1
-2240 1
2832 1
-304 1
-96 1
-45005904 1
6160432 1
38862896 1

So no wonder it's fast, as that's very cache-friendly.

But now let's use our Python reversal on a shuffled list (like in the question with list.reverse, where it didn't make a difference):

Big difference, now that my_reverse loads objects from randomly all over the place, which is the opposite of cache-friendly.

And of course that's what happens with the shuffle functions as well. While list(range(n)) initially is cache-friendly, the shuffling picks random indices j to swap with, which is very cache-unfriendly. And while i just moves sequentially, it's going to encounter a lot of already randomly swapped objects, so that's cache-unfriendly as well.

Python Dummy

Why Is Shuffling List(range(n)) Slower Than Shuffling [0]*n?

Solution 1:

Solution 2:

Post a Comment for "Why Is Shuffling List(range(n)) Slower Than Shuffling [0]*n?"