Skip to content Skip to sidebar Skip to footer

Extract Hash Seed In Unit Testing

I need to get the random hash seed used by python to replicate failing unittests. If PYTHONHASHSEED is set to a non-zero integer, sys.flags.hash_randomization provides it reliably:

Solution 1:

No, the random value is assigned to the uc field of the _Py_HashSecret union, but this is never exposed to Python code. That's because the number of possible values is far greater than what setting PYTHONHASHSEED can produce.

When you don't set PYTHONHASHSEED or set it to random, Python generates a random 24-byte value to use as the seed. If you set PYTHONHASHSEED to an integer then that number is passed through a linear congruential generator to produce the actual seed (see the lcg_urandom() function). The problem is that PYTHONHASHSEED is limited to 4 bytes only. There are 256 ** 20 times more possible seed values than you could set via PYTHONHASHSEED alone.

You can access the internal hash value in the _Py_HashSecret struct using ctypes:

from ctypes import (
    c_size_t,
    c_ubyte,
    c_uint64,
    pythonapi,
    Structure,
    Union,
)


classFNV(Structure):
    _fields_ = [
        ('prefix', c_size_t),
        ('suffix', c_size_t)
    ]


classSIPHASH(Structure):
    _fields_ = [
        ('k0', c_uint64),
        ('k1', c_uint64),
    ]


classDJBX33A(Structure):
    _fields_ = [
        ('padding', c_ubyte * 16),
        ('suffix', c_size_t),
    ]


classEXPAT(Structure):
    _fields_ = [
        ('padding', c_ubyte * 16),
        ('hashsalt', c_size_t),
    ]


class_Py_HashSecret_t(Union):
    _fields_ = [
        # ensure 24 bytes
        ('uc', c_ubyte * 24),
        # two Py_hash_t for FNV
        ('fnv', FNV),
        # two uint64 for SipHash24
        ('siphash', SIPHASH),
        # a different (!) Py_hash_t for small string optimization
        ('djbx33a', DJBX33A),
        ('expat', EXPAT),
    ]


hashsecret = _Py_HashSecret_t.in_dll(pythonapi, '_Py_HashSecret')
hashseed = bytes(hashsecret.uc)

However, you can't actually do anything with this information. You can't set _Py_HashSecret.uc in a new Python process as doing so would break most dictionary keys set before you could do so from Python code (Python internals rely heavily on dictionaries), and your chances of the hash being equal to one of the 256**4 possible LCG values is vanishingly small.

Your idea to set PYTHONHASHSEED to a known value everywhere is a far more feasible approach.

Post a Comment for "Extract Hash Seed In Unit Testing"