hlsl compiler

microsoft’s HLSL compiler is nice.  it’s amazing that you can do the math
behind the ps1.1 texm3x3vspec instruction and the compiler recognizes it and
emits a single instruction.  when it does that you get the feeling that
the compiler is a sentient genius carefully studying and optimizing your
code.  but of course cruel reality eventually steps in and spits in your
face! here are some some examples of the HLSL compiler making baby jesus cry:

  1. you’ve gotta do some extra key punching to get ps1.1 lrp emitted. for
    example…

    float4 a = tex2D(sampler0, texCoord0);float4 b = tex2D(sampler1, texCoord1);return lerp(a, b, Blend);

    produces:

    ps_1_1def c1, 0, 0, 0, -1def c2, 0, 0, 0, 1tex t0tex t1mul r0.w, c0.w, c1.wadd r0.w, r0.w, c2.wmul r1, t1, c0.wmad r0, t0, r0.w, r1

    egads! not what we wanted. easy enough, just change the last line to lerp(a, b,
    saturate(Blend))…

    ps_1_1tex t0tex t1mov_sat r0.w, c0.wlrp r0, r0.w, t1, t0

    mov_sat r0.w, c0.w certainly sucks when we know the constant is already between 0 and 1.
    i haven’t found anyway to hint to HLSL that that’s the case yet. hopefully this doesn’t turn
    into much extra work for the driver or the shader unit. i haven’t tried measuring it yet.

  2. there’s no way to get HLSL compiler to emit the texbem instruction! sure, i understand
    why but that doesn’t make the sting go away! hmm sure would be nice to support only
    HLSL… no stop!! **bam** kick to the nuts!

  3. i’ve got a shader where i blend between two premultiplied alpha textures. the details
    are unimportant but here’s a small snippet of what HLSL compiler emitted:

    ...mov r0.w, r0.w+lrp r0.xyz, t0.w, r0, r1...

    mov r0.w, r0.w?!!! sure at least it co-issued… but is this supposed to be a joke?
    baby jesus isn’t laughing.

  4. here’s the HLSL compiler getting all sloppy on a vertex shader

    oPos.x  = dot(position, transpose(WorldViewProjection)[0]);oPos.y  = dot(position, transpose(WorldViewProjection)[1]);oPos.zw = dot(position, transpose(WorldViewProjection)[2]);

    produced

    dp4 r0.x, v0, c0dp4 r0.y, v0, c1dp4 r0.z, v0, c2mov oPos, r0.xyzz

    instead of

    dp4 oPos.x,  v0, c0dp4 oPos.y,  v0, c1dp4 oPos.zw, v0, c2

come on HLSL compiler, you can do better! i saw what you did with texm3x3vspec.
i believe in you!!

2 Comments

  • Mike Smith says:

    Thanks for the confirmation about lerp in 1.1 We were wondering what the heck was going on. Does Microsoft know about this? Do they care?

  • blackpawn says:

    my thoughts: know about it? yes. care about it? no. they seem to think everybody is using ps2.0 or better these days and provide as little support for ps1.* as they can get away with.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>