sorting alpha blended objects back to front gets transparency something like 95% correct
in games. that last 5% is a real pain though and happens when you have
objects with bounds that overlap so that no matter what order you draw them in
you're going to get errors. sorting the triangles for a transparent object
isn't difficult, but sorting the triangles between objects when those objects
have different materials and the two sets of triangles interpenetrate each other
is awful and doesn't map efficiently to current graphics APIs and hardware.

a couple years back the nice folks at nVidia came up with a technique called depth peeling
in an attempt to solve this. it uses an extra depth buffer and allows you to pull
off the next nearest set of pixels from the scene with each rendering pass. in this way you can peel off many depth layers into separate color buffers
and then as a final step blend all the layers back together in the proper order. it's very expensive though since you need an extra color buffer and render pass
for each layer of transparency in addition to the extra depth buffer.

there are other solutions that researchers have proposed like the A-buffer, F-Buffer, R-Buffer, etc that require only a single rendering pass. most of these approaches focus on temporarily storing all the alpha
blended fragments (except the ones behind the nearest opaque fragment) on the graphics card and then
after everything has been rendered running through all the fragments for each pixel
to properly composite them. this has some similarities with current MSAA implementations but also requires
the extra storage of the fragment depth values and much more complexity in
resolving the final pixel colors. although it requires extra
storage for the depth values, compression techniques like those used with MSAA could
also be used here to reduce memory bandwidth during resolve. for example most pixels will likely have only
one or two fragments so only those need to be touched whereas with depth peeling
you wind up touching every pixel of each color buffer regardless of the actual depth complexity of the scene.

so a pick-your-favorite-letter-Buffer sounds great but what happens when you have very high levels of
depth complexity? if you depend completely on this system to get proper sorting you'll
need a very large number of fragments per pixel. having all this extra
memory seems excessive when you remember that for typical game scenes just sorting
whole objects back to front already gets you so close to the correct result. i think
this is something that's missing in the research and think a good solution would be
a combination of the techniques. you need the hardware support to get those first couple
layers of transparency blended correctly, but after that blending the fragments in
the order they're drawn will look just fine. so instead of stressing on fancy ways
to cope with cases where dozens of fragments are needed per pixel i think just having
the nearest two fragments sorted properly and all other fragments blended in the order
they render will work out great for games and for CAD stuff you could optionally allocate room for many more fragments or just fall back to
depth peeling.

so here's my proposed algo:

  • hardware buffer has slots for let's say 3 color and depth samples per pixel.
    2 color and depth samples are used for the nearest 2 fragments and 1 for the
    combination of all fragments behind the nearest 2.
  • app should render all opaque objects sorted by material, then all alpha blended objects
    sorted back to front
    • any opaque fragment that passes depth test resets the slots for the 2 nearest samples and stores itself in the final slot.
    • any alpha blended fragment compares it's depth against the nearest 2.
      if behind both it immediately blends with the final color. otherwise
      it is inserted in the proper order and the second slot is blended
      into the final color.