XNAInfo blogs
Ramblings about XNA, .NET and stuff

XNA On The 360, Part 2: HDR

August 14, 2008 14:02 by MJP

Designing an effective and performant HDR implementation for my game's engine was another step that was complicated by the Xbox 360.  As a quick refresher for those who aren't experts on the subject, HDR is most commonly implemented by rendering the scene to a floating-point buffer and then performing a tone-mapping pass to bring the colors back into he visible range.  Floating-point formats (like A16B16G16R16F, AKA HalfVector4) are used because their added precision and floating-point nature allows them to comfortbly store linear RGB values in ranges beyond the [0,1] typically used for shader output to the backbuffer, which is crucial as HDR requires having data with a wide dynamic range.  They're also convenient, as this it allows values to be stored in the same format they're manipulated in the shaders.  Newer GPU's also support full texture filtering and alpha-blending with fp surfaces, which prevents the need for special-case handling of things like non-opaque geometry.  However as with most things, what's convient is not always the best option.  As with my shadows, I once again came up with a list of possible techniques and enumerated their pros and cons:

  • Standard HDR, fp16 buffer
    +Very easy to integrate (no special work needed for the shaders)
    +Good precision
    +Support for blending on SM3.0+ PC GPU's
    +Allows for HDR bloom effects
    -Double the bandwidth and storage requirements of R8G8B8A8
    -Weak support for multi-sampling on SM3.0 GPU's (Nvidia NV40 and G70/G71 can't do it)
    -Hardware filtering not available on ATI SM2.0 and SM3.0 GPU's
    -No blending on the Xbox 360
    -Requires double space in framebuffer on the 360, which increases the number of tiles needed
  • HDR with tone-mapping applied directly in the pixel shader (Valve-style)
    +Doesn't require output to an HDR format, no floating-point or encoding required
    +Multi-sampling and blending is supported, even on old hardware
    -Can't do HDR bloom, since only an LDR image is availble for post-processing
    -Luminance can't be calculated directly, need to use fancy techniques to estimate it
    -Increases shader complexity and combinations
  • HDR using an encoded format
    +Allows for a standard tone-mapping chain
    +Allows for HDR bloom effects
    +Most formats offer a very wide dynamic range
    +Same bandwidth and storage as LDR rendering
    +Certain formats allow for multi-sampling and/or linear filtering with reasonable quality
    -Alpha-blending usually isn't an option, since the alpha-channel is used by most formats
    -Linear filtering and multisampling usually isn't mathmatically correct, although often the results are "good enough"
    -Additional shader math needed for format conversions
    -Adds complexity to shaders

 

My early prototyping used a standard tone-mapping chain and I didn't want to ditch that, nor did I want to move away from what I was comfortable with.  This pretty much eliminated the second option for me off the bat...although I was unlikely to choose it anyway due its other drawbacks (having nice HDR bloom was something I felt was an important part of the look I wanted for my game, and in my opinion Valve's method doesn't do a great job of determining average luminance).  When I tried out the first method I found that it worked as well as it always did on the PC (I've used it before), but on the 360 it was another story.  I'm not sure why exactly, but for some reason it simply does not like the HalfVector4 format.  Performance was terrible, I couldn't blend, I got all kinds of strange rendering artifacts (entire lines of pixels missing), and I'd get bizarre exceptions if I enabled multi-sampling. Loads of fun, let me tell you.

This left me with option #3.  I wasn't a fan of this approach initially, as my original design plan called for things to be simple and straightforward whenever possible.  I didn't really want to have two versions of my material shaders to support encoding, nor did I want to integrate decoding into the other parts of the pipeline that needed.  But unfortunately, I wasn't really left with any other options after I found there were no plans to bring the support for the 360's special fp10 backbuffer format to XNA (which would have conveniently solved my problems on the 360).  So, I started doing my research.  Naturally the first place I looked was to actual released commercial game.  Why?  Because usually when a technique is used in a shipped game, it means it's gone trhough the paces and has been determined to actually be feasible and practical in game environment.  Which of course naturally led me to consider NAO32.

NAO32 is a format that gained some fame in the dev community when ex-Ninja Theory programmer Marco Salvi shared some details on the technique over on the beyond3D forums.  Used in the game Heavenly Sword, it allowed for multi-sampling to be used in conjuction with HDR on a platform (PS3) whose GPU didn't support multi-sampling of floating-point surfaces (The RSX is heavily based on Nvidia G70).  In this technique, color is stored in the LogLuv format usinga standard R8G8B8A8 surface.  Two components are used to store X and Y at 8-bit precision, and the other two are used to store the log of luminance at 16-bit precision.  Having 16 bits for luminance allows for a wide dynamic range to be stored in this format, and storing the log of the luminance allows for linear filtering in multi-sampling or texture sampling.  Since he first explained it other games have also used it, such as Naughty Dog's Uncharted.  It's likely that it's been used in many other PS3 games, as well.

My actual shader implementation was helped along quite a bit by Christer Ericson's blog post, which described how to derive optimized shader code for encoding RGB into the LogLuv format.  Using his code as a starting point, I came up with the following HLSL code for encoding and decoding:

 // M matrix, for encoding
const static float3x3 M = float3x3(
    0.2209, 0.3390, 0.4184,
    0.1138, 0.6780, 0.7319,
    0.0102, 0.1130, 0.2969);

// Inverse M matrix, for decoding
const static float3x3 InverseM = float3x3(
    6.0013,    -2.700,    -1.7995,
    -1.332,    3.1029,    -5.7720,
    .3007,    -1.088,    5.6268);    

float4 LogLuvEncode(in float3 vRGB)
{        
    float4 vResult;
    float3 Xp_Y_XYZp = mul(vRGB, M);
    Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
    vResult.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
    float Le = 2 * log2(Xp_Y_XYZp.y) + 127;
    vResult.w = frac(Le);
    vResult.z = (Le - (floor(vResult.w*255.0f))/255.0f)/255.0f;
    return vResult;
}

float3 LogLuvDecode(in float4 vLogLuv)
{    
    float Le = vLogLuv.z * 255 + vLogLuv.w;
    float3 Xp_Y_XYZp;
    Xp_Y_XYZp.y = exp2((Le - 127) / 2);
    Xp_Y_XYZp.z = Xp_Y_XYZp.y / vLogLuv.y;
    Xp_Y_XYZp.x = vLogLuv.x * Xp_Y_XYZp.z;
    float3 vRGB = mul(Xp_Y_XYZp, InverseM);
    return max(vRGB, 0);
}

 
Once I had this implemented and worked through a few small glitches, results were much improved in the 360 version.   Performance was much much better, I could multi-sample again, and the results looked great.  So once again things didn't exactly work out in an ideal way, but I'm pleased with the results.


Tags:
Categories:
Actions: E-mail | Permalink | Comments (3) | Comment RSSRSS comment feed

Related posts

Comments

November 5. 2008 18:57

Dan Glastonbury

Do you have this working on Xbox 360 with out color banding? I'm getting nasty color banding when I decode after the render target is resolved to a texture.

cheers
DanG

Dan Glastonbury

December 31. 2008 03:05

Sean James

Hi, I've been looking at your sample and I'm wondering how you would slow down the speed of the exposure adjustment. When I run the sample, it happens instantaneously. But when you play something like halo, for example, it takes a good half second to adjust.

Adjusting fTau here seems to have no effect except to make the image too bright or dark...

const float fTau = 0.5f;
float fAdaptedLum = fLastLum + (fCurrentLum - fLastLum) * (1 - exp(-g_fDT * fTau));

Sean James

July 13. 2009 21:20

pingback

Pingback from answerspluto.com

list of urls - 5 « Answers Pluto

answerspluto.com