I spent some time the past few weeks tinkering on the GPGPU framework to make it a bit more flexible and more robust. First of all, it got renamed to the acronym Sage (Simple Abstraction for GPU-based Evaluation), mainly so I don't have to type simple GPGPU framework all the time. The main goal of the recent developments was to support the usecases presented by Navier-Stokes fluid simulation as implemented by Epsicode. After quite a bit of pondering and redesign we managed to get it to run 100% on the GPU and I have to say the result looks absolutely stunning:
The main change here is that since this simulation deals exclusively with texture output, the data isn't needed on the CPU at all and the data and computations can remain solely on the GPU. The update to 0.3 turned out to be lacking in terms of flexibility to support this properly and truth be told the GPUBufferUsage.PassThrough flag was far from intuitive. With the upcoming version of the GPGPU framework this will be addressed by having GPUBuffers own their GPU resources, instead of borrowing them from the GPUProcessor. This makes GPUBufferUsage.PassThrough more straightforward, since it's more obvious that a buffer with this flag set will be using its own RT when asked for input, instead of using some RT internal to the GPUProcessor. Perhaps renaming GPUBufferUsage.PassThrough to GPUBufferUsage.UseOutputAsInput would be even more obvious, but it just doesn't sound as mysterious :)
Another change to the GPUBuffer is that the output (the owned RenderTarget) can now be double-buffered. That means that a GPUBuffer can be set as both input and output (texture and RT) simultaneously, while the system transparently takes care of the details. With the help of the excellent 3rd party support by MS, these changes mean the data doesn't need to come back to the CPU at all, which can drastically improve performance. When we started porting the fluid simulation over to Sage we were happy to have it running on a grid of 64x64 data points. With this upcoming version of sage, it can simulate a 512x512 grid smoothly in (mid-to-high end) SM3 hardware and reportedly simulates a 256x256 grid at a leisurely 60fps on the 360.
At the moment the development focus is on migrating the framework into a library project of its own (for easier maintenance and upgrading) and reviewing the design of the new features, to make sure there aren't any gotcha's like GPUBufferUsage.PassThrough in the upcoming release. The samples from the earlier releases will also be ported over and added to the solution. With some more fancy features in the works I don't have a particular release date in mind just yet, so if anyone should want a 'CTP release' in the meantime just drop me an e-mail to feedback@xnainfo.com.