Optimizing SSAO?

You are an experienced programmer and have a problem with the engine, shaders, or advanced effects? Here you'll get answers.
No questions about C++ programming or topics which are answered in the tutorials!

Re: Optimizing SSAO?

Postby REDDemon » Tue Jul 03, 2012 7:09 pm

ah ok sorry :). I was just curious. On windows can happen that a timer reset so I have to detect and ignore cases in wich timer reset. Don't know how it works on other systems .
OpenGL is not hard. What you have to do is just explained in specifications. What is hard is dealing with poor OpenGL implementations.
User avatar
REDDemon
 
Posts: 831
Joined: Tue Aug 31, 2010 8:06 pm
Location: Genova (Italy)

Re: Optimizing SSAO?

Postby BlindSide » Thu Jul 05, 2012 4:25 am

How does it compare with the SSAO implementation in XEffects? (Link in signature).

Very very old piece of code, but worth taking a look.

Cheers,
ShadowMapping for Irrlicht!: Get it here
Need help? Come on the IRC!: #irrlicht on irc://irc.freenode.net
BlindSide
Admin
 
Posts: 2797
Joined: Thu Dec 08, 2005 9:09 am
Location: NZ!

Re: Optimizing SSAO?

Postby hendu » Thu Jul 05, 2012 10:16 am

The XEffects one is more blurry and "hatchy" - darkening has a hatch-pattern when viewed up close, also seems more complex. Can't comment on the relative speed without integrating it to the same scene, too much work for that ;) Though my intuition says the XE one is slower, at least in its current form of unoptimized full res vs optimized quarter res.

FWIW, I got 25 fps in example3 room scene.

Various points:
- 8 samples, I use 16

- full res, vs a quarter (causing the noise, hatchiness, and the blurriness is from the compensation / too strong blur)
I'm after contact shadows mainly, not the small creases, so the quarter-res SSAO looks rather good for my goals.

- on the complexity, I simply use unjittered 2d positions. XE uses jittered 3d positions, with the additional texture and so on

- XE blur is an average, also it skips some pixels. Mine is gaussian, without skipping close pixels

- XE saves depth in a separate pass to a 64-bit float texture (expensive!), further I don't understand why you divide it by the X pixel position.
I save it in the actual render pass by using MRT, to a regular RGBA8 texture.

- XE doesn't skip impossible pixels in any way (neither skybox nor planar surfaces far enough to not receive darkening for certain)

- XE uses shader multiply, which like REDDemon pointed out is slower than using the blending units

OK, just my random thoughts after checking it out 1h after waking up, not sure if it answers your question ;)
hendu
 
Posts: 1554
Joined: Sat Dec 18, 2010 12:53 pm

Re: Optimizing SSAO?

Postby Nadro » Thu Jul 05, 2012 12:39 pm

I save it in the actual render pass by using MRT, to a regular RGBA8 texture.

Are You decode/encode depth into RGBA8? Did You checked a performance with a R16F or R32F? (but 16 bytes of precission should be enough for a SSAO). On DX10 hardware read/write operations from/to a R32F texture should be even faster than RGBA8 with decode/encode operations in a shader.
NBK Game Studio - Official Site:
http://www.nbkgamestudio.pl/
Nadro
 
Posts: 998
Joined: Sun Feb 19, 2006 9:08 am
Location: Warsaw, Poland

Re: Optimizing SSAO?

Postby hendu » Thu Jul 05, 2012 1:09 pm

Yes, I encode/decode the depth. I need the full precision for light pre-pass rendering. I also target GL2 hardware, r500/gf6 and up, not DX10 and up.

The few extra ALU instructions needed should result in near to no speed difference - the latency is in the texture handling, and it may entirely hide the calculation cost. Even more so on higher end cards, where there are huge amounts of ALU resources but not much more texture resources.
hendu
 
Posts: 1554
Joined: Sat Dec 18, 2010 12:53 pm

Re: Optimizing SSAO?

Postby Nadro » Thu Jul 05, 2012 1:55 pm

Hmmm as we can see here:
http://store.steampowered.com/hwsurvey/
90% of gamers use DX10 capable GPU, so in my opinion the better choice is a do an extra optimizations for 90% of users than maintain a compatibility for 10% of users. Of course we can have two code paths for GPU with DX10 support and without of it, but it require too much of time for that small target group.
NBK Game Studio - Official Site:
http://www.nbkgamestudio.pl/
Nadro
 
Posts: 998
Joined: Sun Feb 19, 2006 9:08 am
Location: Warsaw, Poland

Re: Optimizing SSAO?

Postby hendu » Thu Jul 05, 2012 4:42 pm

As all surveys, that's only valid of Steam users, which would tend towards more hard-core gamers. Also I'm not convinced it would be much if at all faster.

From a market POV, one should target the lowest common denominator, for example XP instead of win7. If targeting lower-level gpus gives 5-10% more client base (I think it's more than that, as Steam users would tend to update more often that the general populace), it's quite worth it.


edit: From my initial estimated client base, over half will not have float textures available. That too affects the decision (float textures are a patented feature, and so not enabled in any distro build that does business in the US - ubuntu, fedora..)
hendu
 
Posts: 1554
Joined: Sat Dec 18, 2010 12:53 pm

Re: Optimizing SSAO?

Postby hendu » Thu Jul 05, 2012 4:57 pm

This won't get solved without data, so I benched a 800x600 RTT both simply passing the depth and packing it.

Averages:
Packing to rgba8: 454.841us
Direct passing to r32f: 453.788us

They are the same, within the deviation, on this hd4350.

--

In light of this, I'll gladly take the extra 10% users over no fps increase on less users ;)
hendu
 
Posts: 1554
Joined: Sat Dec 18, 2010 12:53 pm

Re: Optimizing SSAO?

Postby Nadro » Thu Jul 05, 2012 5:07 pm

Of course I don't mean only optimizations for SSAO, but for other parts of the game too. DX10 hardware allow us to do some really nice optimizations. Of course as You said info from a STEAM include mainly a hard-core gamers, but I don't know about better info place. Summarizing, everything depends on a target group (since > 1 year I target to DX10 hardware owners, so maybe because of that from my point of view a maintain compatibility for older GPUs seems not important).

BTW. Yep I know about the problems related to a patents in an open source drivers, US is strange in this case :(

In light of this, I'll gladly take the extra 10% users over no fps increase on less users ;)

And open source users will be happy :P
NBK Game Studio - Official Site:
http://www.nbkgamestudio.pl/
Nadro
 
Posts: 998
Joined: Sun Feb 19, 2006 9:08 am
Location: Warsaw, Poland

Re: Optimizing SSAO?

Postby BlindSide » Thu Jul 05, 2012 11:54 pm

hendu wrote:The XEffects one is more blurry and "hatchy" - darkening has a hatch-pattern when viewed up close, also seems more complex. Can't comment on the relative speed without integrating it to the same scene, too much work for that ;) Though my intuition says the XE one is slower, at least in its current form of unoptimized full res vs optimized quarter res.

FWIW, I got 25 fps in example3 room scene.

Various points:
- 8 samples, I use 16

- full res, vs a quarter (causing the noise, hatchiness, and the blurriness is from the compensation / too strong blur)
I'm after contact shadows mainly, not the small creases, so the quarter-res SSAO looks rather good for my goals.

- on the complexity, I simply use unjittered 2d positions. XE uses jittered 3d positions, with the additional texture and so on

- XE blur is an average, also it skips some pixels. Mine is gaussian, without skipping close pixels

- XE saves depth in a separate pass to a 64-bit float texture (expensive!), further I don't understand why you divide it by the X pixel position.
I save it in the actual render pass by using MRT, to a regular RGBA8 texture.

- XE doesn't skip impossible pixels in any way (neither skybox nor planar surfaces far enough to not receive darkening for certain)

- XE uses shader multiply, which like REDDemon pointed out is slower than using the blending units

OK, just my random thoughts after checking it out 1h after waking up, not sure if it answers your question ;)


Thanks for the in depth feedback! Really thanks for taking such a close look. I actually forgot how I implemented it, so long ago lol. Yes 3D positions can give a decent amount of quality (I think I followed the Crytek paper).

The blur has always been a weak side, I think doing it gaussian would produce a much better result.

Re the seperate depth pass. The reason it saves depth to a seperate path is because it needs to support many other effects such as shadowmapping, and more importantly, I don't want the users of the library to apply a custom material for MRT to their scene nodes, so I do it in a seperate pass. Also I don't think Irrlicht had MRT support at the time. I also can't pack into RGBA8 with 2 16 bit int because that causes artifacts for VSM shadowmapping (Even though it was fine for SSAO, believe me I tested it :P ).

Cheers,
ShadowMapping for Irrlicht!: Get it here
Need help? Come on the IRC!: #irrlicht on irc://irc.freenode.net
BlindSide
Admin
 
Posts: 2797
Joined: Thu Dec 08, 2005 9:09 am
Location: NZ!

Re: Optimizing SSAO?

Postby hendu » Fri Jul 06, 2012 10:16 am

I use Aras' packing method, it has so far packed 24-bit depth losslessly. I believe it would be able to handle close to 30 bits if needed, but my card only goes up to 24; and 24 bits is quite enough precision for depth.

The custom material is a valid point for the XE library vs a standalone app. Though with the materialtype override you could still do that, it would also override the user's custom shaders and not only EMT_SOLID. In addition to that override patch, the only-draw-this-type patch is quite useful, so for depth/shadow/etc you can do smgr->drawAll(ENSRP_SOLID) and only have solids drawn ;)
hendu
 
Posts: 1554
Joined: Sat Dec 18, 2010 12:53 pm

Re: Optimizing SSAO?

Postby devsh » Sun Aug 12, 2012 7:23 pm

Have at BattleField 3 implementation of HBAO

disk to disk SSAO usually gives best quality

my advice here is:
1) Quarter size is RISKY because your SSAO spills over edges and as you see from the plane/box it spilled onto background, which is NOT how SSAO works (you may need edge aware blur and upsampling after SSAO, have a look at how upscaling lighting information is implemented in Inferred rendering with a use of a 50% LBuffer through DSF - Discontinuity Sensitive Filtering)
You also lose a lot of detail
2) What is your sampling pattern??? I know you try to make SSAO's radius bigger by taking samples farther apart, but if you overdo it then you can see banding
3) use disks to sample (random poisson)
4) Like PCF on shadows, smoothing doesnt solve the Banding entirely.... the input has to make it easier for one to filter with blur (noise has to be more random)
5) for noise to be more random, YOU NEED TO ROTATE THE SAMPLING DISK RANDOMLY BETWEEN SAMPLES

You wont have to blur as much, as the eye is much more forgiving to noise than banding patterns

6) BattleField uses Temporal reprojection and classifies pixels into stable (to reduce ghosting and trails) and unstable (to reduce flickering from frame to frame)
Portfolio (WIP) and Development Blog:
http://indirectlightandmagic.tumblr.com/

Do you want to hire a GLSL graphics programmer cheaply???
Try me!
http://indirectlightandmagic.tumblr.com/contact
User avatar
devsh
Competition winner
 
Posts: 1303
Joined: Tue Dec 09, 2008 6:00 pm
Location: UK

Re: Optimizing SSAO?

Postby hendu » Sun Aug 12, 2012 9:05 pm

Thanks for your input. What you suggest would make it slower, which while better looking, is not the goal.

1) Yep. It's not accurate, but good enough.
2) Simple cross, not even randomized.

6) In particular, this would be too heavy.
hendu
 
Posts: 1554
Joined: Sat Dec 18, 2010 12:53 pm

Re: Optimizing SSAO?

Postby devsh » Sun Aug 12, 2012 10:14 pm

Actually... you're wrong, using my improvements you could achieve the same quality at lower cost (improving the quality of less intensive version of your shader)

Especially the cross, a randomly rotated poisson disk would enable to use 25 to 50% less samples for the same "radius" and 25%+ less blur samples (instead of 8x8 blur, 5x5)

Temporal reprojection could enable you to space your samples EVEN further apart (this increases jittering) while maintaining stability
Portfolio (WIP) and Development Blog:
http://indirectlightandmagic.tumblr.com/

Do you want to hire a GLSL graphics programmer cheaply???
Try me!
http://indirectlightandmagic.tumblr.com/contact
User avatar
devsh
Competition winner
 
Posts: 1303
Joined: Tue Dec 09, 2008 6:00 pm
Location: UK

Re: Optimizing SSAO?

Postby hendu » Mon Aug 13, 2012 10:19 am

Jittering causes a changing blur when you're standing still. It's a rather disturbing effect once you learn to see it, kinda like bad kerning. ( http://xkcd.com/1015/ )

Temporal reprojection -> one more RTT, more calculations per sampled pixel (!). Poisson disk -> one more texture, one more texture read, more calculations per pixel. I'm really not convinced that doing either with less samples would be faster while getting the same quality, partly because I rather prefer a smooth blur to the pointy, jittered blur. That's worse quality to my eyes.
hendu
 
Posts: 1554
Joined: Sat Dec 18, 2010 12:53 pm

PreviousNext

Return to Advanced Help

Who is online

Users browsing this forum: Adversus and 1 guest