The current version invokes undefined behavior when the count is negative, zero, or equal to or greater than the width of the operand. The new version masks the count to avoid these situations. Although branchless, it is relatively inefficient if the compiler does not recognize it and translate it to a rol or ror instruction. Empirical tests show that both clang and gcc get it right for constant counts, and recent versions of clang (but not gcc) get it right for variable counts as well. Note that our current code base has no instances of rolN / rorN with a variable count.