1

I'm trying to convert the Screen Space Ambient Occlusion example from XNA 3.1 to XNA 4.0. I've fixed all the problems in the source, except this strange problem in a shader file. I've gone through and fixed all the obvious problems with the shader as guided by Shawn Hargreaves' blog, but when it compiles it uses up 620 instruction slots, which is well over the 512 instruction slot limit. How could this have worked in XNA 3.1, but not in XNA 4.0?

The changes from the 3.1 copy of the file are very minimal, and only consisted of renaming a few functions. below is the full shader source in it's current form. I'll be very grateful for any help in reducing the number instruction slots this compiles to.


float sampleRadius;
float distanceScale;
float4x4 Projection;

float3 cornerFustrum;

struct VS_OUTPUT
{
    float4 pos              : POSITION;
    float2 TexCoord         : TEXCOORD0;
    float3 viewDirection    : TEXCOORD1;
}; 

VS_OUTPUT VertexShaderFunction(
    float4 Position : POSITION, float2 TexCoord : TEXCOORD0)
{
    VS_OUTPUT Out = (VS_OUTPUT)0;

    Out.pos = Position;
    Position.xy = sign(Position.xy);
    Out.TexCoord = (float2(Position.x, -Position.y) + float2( 1.0f, 1.0f ) ) * 0.5f;
    float3 corner = float3(-cornerFustrum.x * Position.x,
            cornerFustrum.y * Position.y, cornerFustrum.z);
    Out.viewDirection =  corner;

    return Out;
}


texture depthTexture;
texture randomTexture;

sampler2D depthSampler = sampler_state
{
    Texture = <depthTexture>;
    ADDRESSU = CLAMP;
    ADDRESSV = CLAMP;
    MAGFILTER = LINEAR;
    MINFILTER = LINEAR;
};

sampler2D RandNormal = sampler_state
{
    Texture = <randomTexture>;
    ADDRESSU = WRAP;
    ADDRESSV = WRAP;
    MAGFILTER = LINEAR;
    MINFILTER = LINEAR;
};

float4 PixelShaderFunction(VS_OUTPUT IN) : COLOR0
{
    float4 samples[16] =
    {
        float4(0.355512,    -0.709318,  -0.102371,  0.0 ),
        float4(0.534186,    0.71511,    -0.115167,  0.0 ),
        float4(-0.87866,    0.157139,   -0.115167,  0.0 ),
        float4(0.140679,    -0.475516,  -0.0639818, 0.0 ),
        float4(-0.0796121,  0.158842,   -0.677075,  0.0 ),
        float4(-0.0759516,  -0.101676,  -0.483625,  0.0 ),
        float4(0.12493,     -0.0223423, -0.483625,  0.0 ),
        float4(-0.0720074,  0.243395,   -0.967251,  0.0 ),
        float4(-0.207641,   0.414286,   0.187755,   0.0 ),
        float4(-0.277332,   -0.371262,  0.187755,   0.0 ),
        float4(0.63864,     -0.114214,  0.262857,   0.0 ),
        float4(-0.184051,   0.622119,   0.262857,   0.0 ),
            float4(0.110007,    -0.219486,  0.435574,   0.0 ),
        float4(0.235085,    0.314707,   0.696918,   0.0 ),
        float4(-0.290012,   0.0518654,  0.522688,   0.0 ),
        float4(0.0975089,   -0.329594,  0.609803,   0.0 )
    };

    IN.TexCoord.x += 1.0/1600.0;
    IN.TexCoord.y += 1.0/1200.0;

    normalize (IN.viewDirection);
    float depth = tex2D(depthSampler, IN.TexCoord).a;
    float3 se = depth * IN.viewDirection;

    float3 randNormal = tex2D( RandNormal, IN.TexCoord * 200.0 ).rgb;

    float3 normal = tex2D(depthSampler, IN.TexCoord).rgb;
    float finalColor = 0.0f;

    for (int i = 0; i < 16; i++)
    {
        float3 ray = reflect(samples[i].xyz,randNormal) * sampleRadius;

        //if (dot(ray, normal) < 0)
        //  ray += normal * sampleRadius;

        float4 sample = float4(se + ray, 1.0f);
        float4 ss = mul(sample, Projection);

        float2 sampleTexCoord = 0.5f * ss.xy/ss.w + float2(0.5f, 0.5f);

        sampleTexCoord.x += 1.0/1600.0;
        sampleTexCoord.y += 1.0/1200.0;
        float sampleDepth = tex2D(depthSampler, sampleTexCoord).a;

        if (sampleDepth == 1.0)
        {
            finalColor ++;
        }
        else
        {       
            float occlusion = distanceScale* max(sampleDepth - depth, 0.0f);
            finalColor += 1.0f / (1.0f + occlusion * occlusion * 0.1);
        }
    }

    return float4(finalColor/16, finalColor/16, finalColor/16, 1.0f);
}


technique SSAO
{
    pass P0
    {          
        VertexShader = compile vs_3_0 VertexShaderFunction();
        PixelShader  = compile ps_3_0 PixelShaderFunction();
    }
}
Kromster
  • 7,181
  • 7
  • 63
  • 111
acp10bda
  • 311
  • 1
  • 4
  • 14

3 Answers3

2

XNA 4.0 enforces the 512 instruction limit (which the xbox360 has and the HiDef profile enforces as a minumum), whereas XNA3.1 didn't.

On the plus side, any graphics card that can run the XNA HiDef profile shouldn't fall over, where as had XNA allowed any number of instructions, it may have done.

Since you have a loop in your code, you could try forcing the compiler to use loop instructions if it's currently unrolling it (not familiar with this myself).

George Duckett
  • 31,770
  • 9
  • 95
  • 162
  • I understand, that there is that limit, I didn't know that XNA 3.1 didn't enforce it. So chances are it was over the limit the whole time. I know there are implementations of SSAO in XNA 4.0, but are closed source as far as I have found. So I know it is possible, but perhaps they used some other set of operations that has the same of a similar effect but uses fewer instructions. – acp10bda Mar 30 '11 at 12:14
  • Having removed the for statement from around the loop (only running it once now of course), the instructions have dropped to a point where the compiler accepts it. But now I am getting runtime errors... I don't think it's relavant to this though so I'll be spawning a new question. I'll just go ahead and mark this as answer as it did have something to do with loop unrolling, however I couldn't find out how to force the compiler to use for loops rather than unrolling it... :/ – acp10bda Mar 30 '11 at 14:55
1

If you are looking for an XNA 4 SSAO that has open source, check this link out : Deferred Rendering with SSAO Normals

Neil Knight
  • 47,437
  • 25
  • 129
  • 188
  • Oh man, I wish I found this a while back, I just went through the process of translating the Deferred Rendering Example given by Catalin Zima from way back in XNA 2.0. Ugh what a pain. I'm totally just going to nab this. Thank you VERY much. – acp10bda Mar 30 '11 at 15:39
  • You're welcome. If you are looking at extending this, I would recommend altering the `get/set` functions in favour for `properties`. I done this with great success (http://project-vanquish.co.cc) I also altered the method calls so I don't pass a `List` through and have a `public static List` that I can call from my `Engine` class. – Neil Knight Mar 30 '11 at 16:00
  • Well I just wanted a complete and working example and or tutorial for it such that I could implement it easily in my Engine (this gives me both!), however I am a bit underwhelmed with the SSAO effect however I think it's because of the random normals if I make some nice hard edge polygon stuff in a scene I'll be better able to judge it based on other SSAO I've seen. Of course, I don't expect it to be super amazing considering the limitations on the SSAO shader compared to apparently all other implementations. I guess it's a limitation imposed by XNA I'll just have to work with. – acp10bda Mar 30 '11 at 16:11
  • Well, which ever way you go - enjoy :o) – Neil Knight Mar 30 '11 at 16:15
0

reduce the number of samples in the shader from 16 to 8