2

I am using WebGL to do hardware skinning, however updating my model node hierarchies is causing a huge hit for performance.

Every node needs to query its current location/rotation/scale keyframes, construct a local matrix with them, and multiply it with its parent's world matrix if it has a parent.

The matrix math itself is as optimized as it gets (special variants of matrix construction based on gl-matrix).

Still, if I update many models, with tens of nodes each (some even with hundreds, sadly), this hogs all of the execution time of the browser.

I have tried using a dirty state for when nodes don't actually need updating, but simply checking if their local data changed (mostly just checking if the location or rotation changed) actually causes the same amount of processing as just calculating the matrices.

WebCL would have been ideal, but that seems to go nowhere since 2014.

I am starting to think of running it all in a shader, but I can't quite wrap my head on how to design it (e.g. storing the keyframes, which are a map of frame->data, or how to write the data back).

Another way is to cache all of the animation transformations in a texture, but this doesn't scale well. For models with a low enough amount of keyframes, this is ok, but for ones with long animations, this turns to hundreds of megabytes very fast. This is mostly because I can't think of any way to store sparse data. If that were possible, then I could store the same amount of transformations as there are keyframes, which would not take a lot of memory (right now, I store the transformations for every single frame). Granted, this would require to do matrix interpolation, and I am not sure how reliable that is.

Does anyone have any ideas?

user2503048
  • 1,021
  • 1
  • 10
  • 22

1 Answers1

0

I dont think its practical to offload the entire node hierarchy calculations to the GPU. The best you can do is upload the 2 absolute world transformations keyframes to GPU and let GPU interpolate between. But I am not sure if the interpolated world transformation is same as if you actually calculated via node hierarchy. If that is possible, then that would be a feasible solution. Note that you cannot interpolate between matrices either. You need to convert it in a form that support interpolation, such as with quaternions + additional data.

I actually ran into this problem as well for my project. I found the updating transformation calculation to be the most time consuming operation, despite having a full collision system + response going as well.

I solved the problem by reducing the amount of times this updating transformation need to be called. For example, if you can conclude that the entire model is out of your view frustum, then you dont need to calculate the transformations at all. This may reduce the amount of calculation you need to do to between ~1/6 to ~1/4.

Secondly, for distant objects, you dont need to update their transformation every frame. Just update their transforms every few frames or so. Remember, there's games that are shipped with only 30FPS and thus a few skipped frame for distant objects may not be noticeable.

Finally, and this may not work for Javascript at all so I didnt do (yet), is that you should store and access data in a cache coherent manner. See these slides. Could have a 10x performance increase. But again, may not work for Javascript because well, Javascript arrays are not guaranteed to pack their data sequentially.

WacławJasper
  • 3,284
  • 14
  • 19
  • My simulation already works at 30FPS. The slides you linked are nice, however I really do wonder how relevant they are to JS. My nodes are in fact already in one shared typed array (although all of the members are stored in order, that is `[local_matrix1, world_matrix1, local_location1, ..., local_matrix2, ...]`, not sure if this would matter), I don't think I ever noticed this speeding up anything - I actually did this change just to be able to move the array around web workers. Culling and LOD might work, but I prefer to get it as fast as possible for the biggest N visible objects first. – user2503048 Aug 15 '15 at 10:39