XNAInfo blogs
Ramblings about XNA, .NET and stuff

Another acronym is born (GPGPU framework ramblings)

April 23, 2010 14:17 by Rim

I spent some time the past few weeks tinkering on the GPGPU framework to make it a bit more flexible and more robust. First of all, it got renamed to the acronym Sage (Simple Abstraction for GPU-based Evaluation), mainly so I don't have to type simple GPGPU framework all the time. The main goal of the recent developments was to support the usecases presented by Navier-Stokes fluid simulation as implemented by Epsicode. After quite a bit of pondering and redesign we managed to get it to run 100% on the GPU and I have to say the result looks absolutely stunning:

The main change here is that since this simulation deals exclusively with texture output, the data isn't needed on the CPU at all and the data and computations can remain solely on the GPU. The update to 0.3 turned out to be lacking in terms of flexibility to support this properly and truth be told the GPUBufferUsage.PassThrough flag was far from intuitive. With the upcoming version of the GPGPU framework this will be addressed by having GPUBuffers own their GPU resources, instead of borrowing them from the GPUProcessor. This makes GPUBufferUsage.PassThrough more straightforward, since it's more obvious that a buffer with this flag set will be using its own RT when asked for input, instead of using some RT internal to the GPUProcessor. Perhaps renaming GPUBufferUsage.PassThrough to GPUBufferUsage.UseOutputAsInput would be even more obvious, but it just doesn't sound as mysterious :)

Another change to the GPUBuffer is that the output (the owned RenderTarget) can now be double-buffered. That means that a GPUBuffer can be set as both input and output (texture and RT) simultaneously, while the system transparently takes care of the details. With the help of the excellent 3rd party support by MS, these changes mean the data doesn't need to come back to the CPU at all, which can drastically improve performance. When we started porting the fluid simulation over to Sage we were happy to have it running on a grid of 64x64 data points. With this upcoming version of sage, it can simulate a 512x512 grid smoothly in (mid-to-high end) SM3 hardware and reportedly simulates a 256x256 grid at a leisurely 60fps on the 360.

At the moment the development focus is on migrating the framework into a library project of its own (for easier maintenance and upgrading) and reviewing the design of the new features, to make sure there aren't any gotcha's like GPUBufferUsage.PassThrough in the upcoming release. The samples from the earlier releases will also be ported over and added to the solution. With some more fancy features in the works I don't have a particular release date in mind just yet, so if anyone should want a 'CTP release' in the meantime just drop me an e-mail to feedback@xnainfo.com.


Tags:
Categories:
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

Stuff abound

March 11, 2010 02:40 by Rim

There seem to be a lot of exciting things happening at the moment. XNA 4.0 has been announced which adds Windows Mobile 7 phones to the list of supported platforms, but you already knew that of course. While Shawn is busy at GDC, I took the liberty of stealing his Fractals code. It doesn't improve anything upon his release in 2006, but I happened upon it and I thought I'd be nice to have this available as a ready-to-run sample. On that page you'll also find my take at handling input (inspired by this discussion) which seems to be all the rage these days.

Other than that, I've started tinkering on a little animation pipeline for XNA together with a 3D artist as a learning experience. I have no idea if this will be anywhere near finished any time soon, or if it'd even be useful for the community since there seem to be a lot of libraries out there already. Like any good software project I settled on an acronym for the name anyway before its anywhere near done, so I lay claim to Uxmal. It's a good fit since the artist is working in Maya and I'm on my third rewrite of the codebase already. Xna Model Animation Library also fits, but my development efforts are currently stumped by trying to come up with what the U should mean. Useful would be ideal, but it might also be Unnecessary or Useless.

Time will tell :)


Tags:
Categories: Frontpage | Ramblings | XNA
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

Detour Ahead

February 24, 2010 10:01 by Rim

  
I really gotta stop doing things like this. What started out as a Soft Body Physics demo ended up in writing a barebones GPGPU library, which ended up in writing a stand-alone GPU-particle demo for release on XnaInfo (coming soon!), which ended up in a little black hole:


 

I hope to find some time for the write-ups and posting code soon, please bear with me :)


Tags:
Categories: Ramblings | XNA
Actions: E-mail | Permalink | Comments (3) | Comment RSSRSS comment feed

Soft body physics

February 21, 2010 09:35 by Rim


I've been working on a soft body physics demo (more details). I haven't had time to write up more about it to post here, but I wanted to try to see if posting YouTube videos works on this infernal blog software. Stay tuned for more on this :)


Tags:
Categories: Graphics | Ramblings | XNA
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

Logarithmic Depth Buffer

February 20, 2010 14:28 by Rim

I came across an excellent discussion of setting up a logarithmic depth buffer in Cameni's Journal. I heard of those before, but I was surprised to find implementing them is as easy as adding only one simple line of code to your vertex shader:

output.Position.z = log(C*output.Position.z + 1) / log(C*Far + 1) * output.Position.w;

The C constant allows you to define the resolution you want at the near plane and Far is the value you use for the far plane. More details can be found in the journal linked above. I happened to be tinkering on a model of our solar system with a bunch of stars surrounding us (from Hipparcos data) to scale. The lightyear distances for the stars and the AU distances within the system were a source of woes with the depth buffer, but using this simple line made my z-fighting stars play nice.


Tags:
Categories: Frontpage | Ramblings
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

Pet Project - StickFight

May 12, 2009 16:38 by Rim

I haven't gotten around to writing interesting samples or uncovering any deep XNA truths lately. Instead I've been tinkering on a little beat-em-up game with stylized stick men to do the fighting. These little actors are entirely procedural, so the game itself generates the geometry and animations rather than using models created by artists. Obviously nothing can replace a good artist and this proved painfully true when it came to the animations.

The animations are generated by a particle-based physics system (described here), which works by applying force to the attacking limb towards the victim, checking collisions and letting the simulation run its course. The base skeleton displayed below is set up easily enough, but without additional contraints the resulting movements are far from natural. As noted in the original article, a lot of tweaking can also be done using the mass of particles to get some control over how easily particles (i.e. joints) can move.

So if the skeleton and animations are that hard to tweak, you might be wondering what good this procedural technique is then. The beauty of this -admittedly simple- physics based rendering setup is that you essentially get inverse kinematics for free. If I want to hit my opponent with a hand, I just apply some force on the hand towards where I want to hit him. With sufficient tweaking, this produces convincing animations for accurately hitting the victim anywhere with any part of the attacking actor. Headbutts, kicks and more exotic attacks are just a matter of picking target and subject particles on either side.

Another nice benefit of this procedural approach is that the geometry is very accessibly to the program and thus can be altered in a variety of ways. With a few minutes of tinkering, style variations like those below are easily implemented.

The project is still a pretty long way off from becoming a playable game, but it's already made its way around the office for passive-aggressive stress relief   I'm afraid I can't put a playable build out anytime soon, but in the meantime here's a little movie (WMV, 7mb) showing some basic pummeling and the style so far. Since a lot of tweaking is involved and much of the style is still up in the air, comments and/or suggestions would be much appreciated.


Free online book about shaders, lighting, shadows, everything!

January 28, 2009 13:05 by Admin

 

Foregoing lamenting my own slacking, I want to point everyone who cares to read this to a great online resource I stumbled upon today:

A free online book called "Programming Vertex, Geometry, and Pixel shaders"

It is geared towards D3D10, but it contains an incredible wealth of information about nearly every graphics topic you'd want to implement. The concise and thorough theory certainly holds for XNA and most shaders should give you a good idea how stuff gets implemented, if they don't work out of the box. It's written by these guys, who deserve a truckload of cookies in my opinion!

Oh and Jack, if you ever should find your way here, we have to discuss your definition of 'not very active' :)


Tags:
Categories: Frontpage | Graphics
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

How I Saved 3ms By Unrolling A Loop

September 25, 2008 10:05 by MJP
Last night I decided to sit down and do some nitty gritty optimization on the 60 version of my game.  The PC version has been running great even when I crank up every setting I have (although that might have something to do with the 4870 I just bought), but as well all know by now things are never so easy on the Xbox.  For the past few weeks I'd been struggling to keep the framerate above 60Hz, with it slipping down to 55Hz during complex scenes.  Now 55fps wouldn't be so much of a problem if I weren't the kinda guy who hates screen tearing, but it just so happens I am.  Which means I want VSYNC to be enabled, which means the game drops to 30fps whenever it's below 60.  Definitely unacceptable.  For a while I avoided the problem by doing the unthinkable...I dropped the resolution from 1280 x 720 to 1024 x 600.  I even considered leaving it this way for the final release...I mean if Halo 3 and Metal Gear Solid 4 can do it, why can't I? 

But no, I'm too finicky to settle with the lowered resolution.  It just looks so much better in 720p!  So I cranked the res back up, and decided to see where I could squeeze out some extra performance.  Naturally, the first place I went to was my show mapping shader.  I already knew this particular shader was giving me trouble, since I'd discovered that decreasing the size of the buffer that I render the shadow occlusion to (since I do a deferred shadowing pass) resulted in significant performance gains.  I'd already reduced thigns to 4 PCF samples on the 360 (did I mention I ditched VSM for the 360?  Performance and precision ended up being so awful it wasn't worth it) so I couldn't squeeze that down anymore.  At this point my eyes drifted down to little loop I had for determining which split of shadow map cascade to use, when I remembered an excellent presentation on shader performance that I'd read a long time ago. One of the things mentioned in there was that unrolling loops can have a significant impact on general purpose register usage, ALU usage, and shader compiler optimization.  So I thought, "hey, let's try unrolling this loop and flattening this branch".  I then changed my code from this:

for (int i = 1; i < NUM_SPLITS; i++)
{
    if (vPositionVS.z <= g_vClipPlanes[i].x && vPositionVS.z > g_vClipPlanes[i].y)
    {
        matLightViewProj = g_matLightViewProj[i];
        fOffset = i / (float)NUM_SPLITS;           
    }
}  


to this:

[unroll(NUM_SPLITS)]
for (int i = 1; i < NUM_SPLITS; i++)
{
    [flatten]
    if (vPositionVS.z <= g_vClipPlanes[i].x && vPositionVS.z > g_vClipPlanes[i].y)
    {
        matLightViewProj = g_matLightViewProj[i];
        fOffset = i / (float)NUM_SPLITS;           
    }
}  
      

I ran the game again, and BAM:  I shot up from 55fps to 65fps!  Huge difference!  I was very impressed with myself.  Moral of the story:  experiement with stuff, and make sure you profile it!

Today I decided to read through some other presentations to see what other useful bits I could find. This one from Gamefest 2007 pointed out that vfetch's should be aligned to 32-bytes on the 360.  I did the math on the vertex declaration used for most of my models, and found out it's 48 bytes.  Later tonight I'll have to see if I can squeeze it down to 32, and see if it makes performance any better. 

I also came across this one from Gamefest 2008, which is all about how texture and surface formats are handled on the 360.  This gave me some insight into some problems I'd come across already.  For example, R32G32F (Vector2) isn't actually a format the GPU can render to!  Apparently it renders to R16G16F, and then just expands it upon resolve from eDRAM.  This explained why my VSM's with exponential warp were have such precision problems.  Another thing pointed out in there is that the 360's texture units filter fp16 at 1/4 the rate of INT8!  This would help explain why my VSM performance was so poor.  Definitely good things to keep in mind.

-MJP

Dutch .NET Magazine

September 15, 2008 06:09 by Rim

 

A bit of an off-topic post for any Dutch readers that may come across our little blog. MS decided to revamp their Dutch .NET Magazine and change the subscription to Opt-In, meaning you'll explicitly need to tell them you want to keep receiving the magazine after the next issue of September 22nd. You can find more details and re-subscribe over at this page:

http://www.microsoft.nl/netjesgeregeld

It's quite a risky step for them to take, but they want to make sure they're reaching their reader base and get a clearer picture of the interests of their readers. The magazine remains free and they've drummed up a full-blown redactional team for the revamp, so if you've enjoyed the magazine so far make sure you re-subscribe!


XNA on the Xbox 360, Part 3: General Practices

August 28, 2008 08:55 by MJP

Over at gamedev.net someone had asked for some general performance pointers regarding using XNA on the 360.  After giving a bullet-point list of what I thought were the important issues I thought to myself "Hey, that was pretty good.  Let's milk it for all it's worth!"  And therefore I've copied and pasted them all here in glorious display of laziness.  Laughing

 

-The 360 has 512MB of unified memory, which means it's shared by both the CPU and the GPU. You don't have all of that available to you, since some of it is taken up by the console's "OS" and some will also be taken up by the .NET Compact Framework.  You'll also be working from the managed heap, rather than directly working with native memory.

-You can only execute pure managed code on the 360.  You can't, for example, P/Invoke into a non-managed DLL. 

-As far as GPU shaders go, you're pretty unrestricted.  You can use SM3.0 HLSL, or you can also write portions of your shaders in the GPU's native microcode.  This is really very nice...it lets you do things like un-normalized texture addressing, full texturing capabilities in the vertex shader, or directly fetching an element from a vertex stream.  The microcode set is referred to as xvs_3_0 and xps_3_0. 

-The 360's GPU is different from your average PC GPU in that it has an eDRAM framebuffer.  The eDRAM is 10MB in size, and has tremendous bandwidth (256GB/s).  What this means is that writing out to the framebuffer or reading it back for blending is very very quick.  Multi-sampling is also very quick, since again you don't have the bandwidth problem.  In fact MSAA would be "free" if it weren't for tiled rendering...you see the downside of eDRAM is that if your render-target + z-buffer is too big to fit in eDRAM, you have to render to it in tiles.  This means you render one portion of the target, then another.  This isn't so bad, except for the fact that any geometry that's on the edges of 2 tiles has to be drawn twice.  If you're not doing scenes with hugely complex geometry you probably won't even notice tiling (it happens automatically).  To figure out whether you're going to tile you need to count the amount of bytes per pixel and then multiply by resolution.  So for example if you're rendering to the Color format which is 4 bytes per pixel and you're using Depth24Stencil8 which is also 4 bytes per pixel, you have 8 bytes per pixel total.  When multi-sampling, you multiply this amount by the number of samples (so 4xMSAA would by 32 bytes per pixel).  1280 x 720 with 4xMSAA would be ~28MB, so you'd need 3 tiles.

-Be prepared to get CPU-bound really quick if you're doing anything non-trivial.  DrawPrimitive calls are extremely expensive on the 360...I've seen my framerate go from about 70 to 30 just from going from 24 DP calls to 34.  Instancing is a must if you need to draw a lot of meshes...there's a good sample on the CC website.  By the same token if you're doing any really fancy logic on the CPU that's not graphics-related, you'll probably need to run it on another thread on a different core since your main thread can get bogged down pretty quick.

-Watch out for performance pitfalls with the .NET Compact Framework.  Things like Garbage Collection compaction and virtual function calls are much more expensive than they are on the PC.  I suggest reading this blog for tips.  Just remember to keep your live object count as low as possible, and you should be okay.

-Floating-point performance is not so great on the 360 CPU.  Most of the fp power is in the vector units, but you have no access to those through XNA. 

-Avoid the surface formats that are larger than 32bpp.  Mainly HalfVector4 and Vector2.  Their performance is generally pretty terrible, and I've run into all kinds of driver bugs with them.  This means you can't do HDR in straightforward way, but there are other options.  There's an entry on my XNA blog where I talk about how I got around it.

-Watch your texture sampling bandwidth.  Framebuffer access may be quick, but reads from textures are limited by the 22.1GB/s read bandwidth.  This may be quite a bit less than what you're used to, if you're prototyping on a higher-end card like an 8800.  This can be especially painful in scenarios where you want to take multiple samples per pixel, like PCF for shadow maps or SSAO. 

-Prototyping and developing on the PC is a good idea since you have access to PIX, but make sure you test pretty often on the 360.   You may need to optimize for quite few scenarios if you need to keep the framerate up.  


Tags:
Categories: XNA
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed