Preferences

Der_Einzige parent
This belief (LLMs are deterministic except for samplers) is very wrong and will get you into hilariously large amounts of trouble for assuming it's true.

Also greedy sampling considered harmful: https://arxiv.org/abs/2506.09501

From the abstract:

"For instance, under bfloat16 precision with greedy decoding, a reasoning model like DeepSeek-R1-Distill-Qwen-7B can exhibit up to 9% variation in accuracy and 9,000 tokens difference in response length due to differences in GPU count, type, and evaluation batch size. We trace the root cause of this variability to the non-associative nature of floating-point arithmetic under limited numerical precision. This work presents the first systematic investigation into how numerical precision affects reproducibility in LLM inference. Through carefully controlled experiments across various hardware, software, and precision settings, we quantify when and how model outputs diverge. Our analysis reveals that floating-point precision—while critical for reproducibility—is often neglected in evaluation practices."


marcinzm
Does this apply to TPUs or just GPUs?
recursivecaveat
It's more a system level property. Even if you used CPUs, if you're not careful in your design to control how results are distributed and combined, you will get variance.
sgt101
Great reference - thanks.

This item has no comments currently.