In my eagerness to eliminate a branch which is taken once per 2^38 bytes of keystream, I forgot that the state words are in host order. Thus, the counter increment code worked fine on little-endian machines, but not on big-endian ones. Switch to a simpler (branchful) solution.
The current version invokes undefined behavior when the count is negative, zero, or equal to or greater than the width of the operand. The new version masks the count to avoid these situations. Although branchless, it is relatively inefficient if the compiler does not recognize it and translate it to a rol or ror instruction. Empirical tests show that both clang and gcc get it right for constant counts, and recent versions of clang (but not gcc) get it right for variable counts as well. Note that our current code base has no instances of rolN / rorN with a variable count.
We failed to clear the negative flag when handling trivial cases, so if one of the terms was 0 and the other was negative, the result would be an exact copy of the non-zero term instead of its absolute value.
- Use the new vector byte-order conversion functions where appropriate.
- Use memset_s() instead of memset() where appropriate.
- Use consistent names and types for function arguments.
- Reindent, rename and reorganize to conform to Cryb style and idiom.
SHA224 and SHA256 were left mostly unchanged. MD2 and MD4 were completely rewritten as the previous versions (taken from XySSL) seem to have been copied from RSAREF.
This breaks the ABI as some context structures have grown or shrunk and some function arguments have been changed from int to size_t.
Single-DES is now a special case of triple-DES with all three keys being the same. This is significantly slower than a pure single-DES implementation, but that's fine since nobody should be using it anyway.