A more naive but less complex way to avoid overflows would be to use the original factorial expression but work with logs - in a rough experiment with python for k=10^8 and N=10^17 (numbers plucked out of thin air), the calculation takes ~20s on an M1 MacBook Pro.
This didn't feel too egregious until I looked at the space of inputs and outputs for SHA-256, which is 2^256 out and an absolutely stonking 2^(2^64) - 1 possible inputs.
It feels obvious but I'm still surprised by the efficiency of approximations.
This didn't feel too egregious until I looked at the space of inputs and outputs for SHA-256, which is 2^256 out and an absolutely stonking 2^(2^64) - 1 possible inputs.
It feels obvious but I'm still surprised by the efficiency of approximations.