microsoft's HLSL compiler is nice.  it's amazing that you can do the math
behind the ps1.1 texm3x3vspec instruction and the compiler recognizes it and
emits a single instruction.  when it does that you get the feeling that
the compiler is a sentient genius carefully studying and optimizing your
code.  but of course cruel reality eventually steps in and spits in your
face! here are some some examples of the HLSL compiler making baby jesus cry:

  1. you've gotta do some extra key punching to get ps1.1 lrp emitted. for
    example...

    float4 a = tex2D(sampler0, texCoord0);
    float4 b = tex2D(sampler1, texCoord1);
    return lerp(a, b, Blend);

    produces:

    ps_1_1
    def c1, 0, 0, 0, -1
    def c2, 0, 0, 0, 1
    tex t0
    tex t1
    mul r0.w, c0.w, c1.w
    add r0.w, r0.w, c2.w
    mul r1, t1, c0.w
    mad r0, t0, r0.w, r1

    egads! not what we wanted. easy enough, just change the last line to lerp(a, b,
    saturate(Blend))...

    ps_1_1
    tex t0
    tex t1
    mov_sat r0.w, c0.w
    lrp r0, r0.w, t1, t0

    mov_sat r0.w, c0.w certainly sucks when we know the constant is already between 0 and 1.
    i haven't found anyway to hint to HLSL that that's the case yet. hopefully this doesn't turn
    into much extra work for the driver or the shader unit. i haven't tried measuring it yet.


  2. there's no way to get HLSL compiler to emit the texbem instruction! sure, i understand
    why but that doesn't make the sting go away! hmm sure would be nice to support only
    HLSL... no stop!! **bam** kick to the nuts!


  3. i've got a shader where i blend between two premultiplied alpha textures. the details
    are unimportant but here's a small snippet of what HLSL compiler emitted:

    ...
    mov r0.w, r0.w
    +lrp r0.xyz, t0.w, r0, r1
    ...

    mov r0.w, r0.w?!!! sure at least it co-issued... but is this supposed to be a joke?
    baby jesus isn't laughing.


  4. here's the HLSL compiler getting all sloppy on a vertex shader

    oPos.x  = dot(position, transpose(WorldViewProjection)[0]);
    oPos.y = dot(position, transpose(WorldViewProjection)[1]);
    oPos.zw = dot(position, transpose(WorldViewProjection)[2]);

    produced

    dp4 r0.x, v0, c0
    dp4 r0.y, v0, c1
    dp4 r0.z, v0, c2
    mov oPos, r0.xyzz

    instead of

    dp4 oPos.x,  v0, c0
    dp4 oPos.y, v0, c1
    dp4 oPos.zw, v0, c2



come on HLSL compiler, you can do better! i saw what you did with texm3x3vspec.
i believe in you!!