add in parallel
add to 64 3-bit numbers in parallel using mmx intrinsics:
__m64 plane0, plane1, plane2; // source bit planes
__m64 out0, out1, out2; // result bit planes
__m64 x; // bits to add
__m64 carry; // temp
out0 = _mm_xor_si64(x, plane0);
carry = _mm_and_si64(x, plane0);
out1 = _mm_xor_si64(carry, plane1);
carry = _mm_and_si64(carry, plane1);
out2 = _mm_xor_si64(carry, plane2);
mmx makes me happy :)
read other posts