Idea on how to add Threaded Rendering

Discuss about anything related to the Irrlicht Engine, or read announcements about any significant features or usage changes.
Post Reply
REDDemon
Developer
Posts: 1044
Joined: Tue Aug 31, 2010 8:06 pm
Location: Genova (Italy)

Idea on how to add Threaded Rendering

Post by REDDemon »

I know the topic may be old, but since changes are anyway planned for irrlicht 2, what about adding finally support for threaded rendering?

I've been working on threaded rendering for quite a time as thesis work (not a irrlicht-related project by the way), and I just realized it is very possible implementing that in irrlicht too. Actually this can added right now, however I think the api has some issues that prevent optimal performance. Isn't better to take the chance to rework the irrlicht api alongside adding long-asked features? I just almost cry when I see great developers here contributing so much in irrlicht and having to fight against its api, probably they could already had developed a new engine in that time.

Here's the the main ideas behind my thesis, sharing them here hoping these can help irrlicht in someway.

1) OpenGL calls are already "deferred": commands are wrote to a command buffer that is executed by GPU. However OpenGL has a fat validation time and is designed in a way that some calls become blocking or even a bottleneck

2) The idea I use is to deferr the GL calls again: using a command buffer in user-space. That way the GL overhead is moved to a different thread. I found an implementation that is both efficient and easy to implement. What I use is a linked list of commands arrays where parameters of commands are saved (like a delegate, without the overhead of std::function). This is not that simple in reality, but I'm cutting here just to give an idea to see if there's interest in that. This is possible partially by using a different API where we have some persisten proxy objects that allow implicit synchronization (so basically I lock 1 mutex once per frame, because I managed to reduce the needed communication between the 2 threads).

3) As long as commands are stored in lists it is possible to perform a general purpose states minimisation (think to it as a Just In Time OpenGL compiler). Actually irrlicht is a branching monster when it comes to IVideoDriver implementations, I just think the overhead of saving state changes is just almost the same of state changes optimized out. In my approach the states optimization minimization is performed once every while (actually I have a single command list, but is a joke splitting commands in more lists to even further reduce the optimization overhead by working on "smaller items", provided that profiling shows that will be worth), that results basically in an optimized list that can be executed by rendering threadd every frame without any branching at all.

So basically instead of having a branch monster every frame, we have a branch monster that is executed only seldomly and in a different thread. Decoupling main thread from rendering thread may hence give a total overhead that is lesser than the overhead of a single thread (plus much more free CPU time on both threads!)

While my approach allows to keep the "usual" API and to create asynchronous execution in background (seriously I can keep old API, even if it is not a good idea), I think someone should take the chance and rework the irrlicht API.

My idea (I have some public code but far from being a complete engine) is to separate responsibilities as much as possible, so I think we need a replacement for IVideoDriver, get rid of the current reference count system and start from that. Stuff like loaders and scene managers can be implemented at a second stage, actually having a complete video driver layer should be the priority because there are already users asking for threaded rendering and with a powerfull & easy to use driver it is a breeze for users starting making additions to the engine.

here some code snippets to get Idea of a new API I developed, the resources handled by GPU are managed (implicit reference count, so we no longer have to grab & drop) while other stuff is injected or created on the stack.

Code: Select all

 
//setup 2 renderpasses
RenderPass solid, clear;
pass.ZWrite = false;
pass.RenderTarget = std::move( texture);
clear.clearColor    = true;
clear.clearDepth    = true;
clear.clearStencil= true;
 
RenderSlot slot;
slot.shader = std::move( myMesh); //in my case each shader is a mesh, gpu programs are recycled/reused behind the scenes. 
slot.indices = std::move( myMeshIndices);
slot.mode = DrawMode::Triangles;
 
solid.push_back(slot);
 
// total control over rendering
renderlist.push_back(clear);
renderlist.push_back(solid);
 
renderQueue->compileStates(renderlist); //asynchronously stuff like GPU programs compilation and texture loading already started when we finally submit rendering commands.
 
Basically I started developing an advanced videodriver (cannot call that because it is divided into a bunch of classes so it is not the same as irrlicht's driver ) that has cutting edge features like threaded rendering, states minimization and have no inversion of control (so you have total control), the inversion of control part infact can be added later or built on top of it easily (infact also IVIdeoDriver of irrlicht, AFAIK, has no inversion of control).

I tought it could be of interest for irrlicht community here. I know that would be a big change, but I think is worth it.
Post Reply