I also tried this
bool is_divisible_by_15(int x) {
return x % 3 == 0 && x % 5 == 0;
}
bool is_divisible_by_15_optimal(int x) {
return x % 15 == 0;
}
is_divisible_by_15 still has a branch, while is_divisible_by_15_optimal does not is_divisible_by_15(int):
imul eax, edi, -1431655765
add eax, 715827882
cmp eax, 1431655764
jbe .LBB0_2
xor eax, eax
ret
.LBB0_2:
imul eax, edi, -858993459
add eax, 429496729
cmp eax, 858993459
setb al
ret
is_divisible_by_15_optimal(int):
imul eax, edi, -286331153
add eax, 143165576
cmp eax, 286331153
setb al
ret
But note the branch in the first function! The original code uses the && operator, which is short-circuiting -- so from the compiler's perspective, perhaps the programmer expects that x % 2 will usually be false, and so we can skip the expensive 3 most of the time. The "suboptimal" version is potentially quite a bit faster in the best case, but also potentially quite a bit slower in the worst case (since that branch could be mispredicted). There's not really a way for the compiler to know which version is "better" without more context, so deferring to "what the programmer wrote" makes sense.
That being said, I don't know that this is really a case of "the compiler knows best" rather than just not having that kind of optimization implemented. If we write 'x % 6 && x % 3', the compiler pointlessly generates both operations. And GCC generates branchless code for 'is_divisible_by_6', which is just worse than 'is_divisible_by_6_optimal' in all cases.