so previously i blogged about my glypherizer experiment to approximate images with a small number of font glyphs. unfortunately it didn't meet with the fantastic success i had hoped. :P the ideas were still a bit itchy though so i wound up working some more on it. this time around i used images instead of font glyphs so they could be brush strokes or other crazy stuff in addition to text. also i rewrote the code in CUDA so it runs on a 240 thread GPU instead of a measly 8 thread CPU. here you can see some of the strokerization tests:

CatSeries 1,000 to 45,000 strokes guild wars strokerized

Cat512Series CatShapeSeries

you can see more of the full set on flickr if you like. thanks very much to kirill for the library of cool brush strokes!

this was my first real CUDA app and i was surprised just how quickly i was able to port everything over to GPU kernels. so getting it running on the GPU was pretty easy, but what turned out to be harder was making it fast. i had to optimize my kernels to reduce the register counts they needed quite a bit and use nvidia's occupancy calculator spreadsheet thing to work out good block sizes and thread counts. that along with using the async stream API helped get the speed about where i expected but i still think there are some things holding it back. i can't wait for more performance tools for tracking this stuff down.