mmx is great. sse is not. mmx fits all the uses i try it with just great (64 bit stuff, rgba pixels, etc). everytime i use sse i end up wasting so many cycles shuffling data in and out of memory and between the registers. a horizontal add instruction would help a lot (which sse3 has i think).