microsoft's HLSL compiler is nice. it's amazing that you can do the math
behind the ps1.1 texm3x3vspec instruction and the compiler recognizes it and
emits a single instruction. when it does that you get the feeling that
the compiler is a sentient genius carefully studying and optimizing your
code. but of course cruel reality eventually steps in and spits in your
face! here are some some examples of the HLSL compiler making baby jesus cry:
you've gotta do some extra key punching to get ps1.1 lrp emitted. for
float4 a = tex2D(sampler0, texCoord0);
float4 b = tex2D(sampler1, texCoord1);
return lerp(a, b, Blend);
def c1, 0, 0, 0, -1
def c2, 0, 0, 0, 1
mul r0.w, c0.w, c1.w
add r0.w, r0.w, c2.w
mul r1, t1, c0.w
mad r0, t0, r0.w, r1
egads! not what we wanted. easy enough, just change the last line to lerp(a, b,
mov_sat r0.w, c0.w
lrp r0, r0.w, t1, t0
mov_sat r0.w, c0.w certainly sucks when we know the constant is already between 0 and 1.
i haven't found anyway to hint to HLSL that that's the case yet. hopefully this doesn't turn
into much extra work for the driver or the shader unit. i haven't tried measuring it yet.
there's no way to get HLSL compiler to emit the texbem instruction! sure, i understand
why but that doesn't make the sting go away! hmm sure would be nice to support only
HLSL... no stop!! **bam** kick to the nuts!
i've got a shader where i blend between two premultiplied alpha textures. the details
are unimportant but here's a small snippet of what HLSL compiler emitted:
mov r0.w, r0.w
+lrp r0.xyz, t0.w, r0, r1
mov r0.w, r0.w?!!! sure at least it co-issued... but is this supposed to be a joke?
baby jesus isn't laughing.
here's the HLSL compiler getting all sloppy on a vertex shader
oPos.x = dot(position, transpose(WorldViewProjection));
oPos.y = dot(position, transpose(WorldViewProjection));
oPos.zw = dot(position, transpose(WorldViewProjection));
dp4 r0.x, v0, c0
dp4 r0.y, v0, c1
dp4 r0.z, v0, c2
mov oPos, r0.xyzz
dp4 oPos.x, v0, c0
dp4 oPos.y, v0, c1
dp4 oPos.zw, v0, c2
come on HLSL compiler, you can do better! i saw what you did with texm3x3vspec.
i believe in you!!