add to 64 3-bit numbers in parallel using mmx intrinsics:


__m64 plane0, plane1, plane2; // source bit planes
__m64 out0, out1, out2;       // result bit planes
__m64 x;                      // bits to add
__m64 carry;                  // temp
out0  = _mm_xor_si64(x, plane0);
carry = _mm_and_si64(x, plane0);
out1  = _mm_xor_si64(carry, plane1);
carry = _mm_and_si64(carry, plane1);
out2  = _mm_xor_si64(carry, plane2);

mmx makes me happy :)