Improving the performance of Webgl2 texSubImage2D call with large texture

Question

Using WebGL2 I stream a 4K by 2K stereoscopic video as a texture onto the inside of a sphere in order to provide 360° VR video playback capability. I've optimized as much of the codebase as is feasible given the returns on time and the application runs flawlessly when using an .H264 video source.

However; when using 8bit VP8 or VP9 (which offer superior fidelity and file size, AV1 isn't available to me) I encounter FPS drops on weaker systems due to the extra CPU requirements for decoding VP8/VP9 video.

When profiling the app, I've identified that the per-frame call of texSubImage2D that updates the texture from the video consumes the large majority of each frame (texImage2D was even worse due to it's allocations), but am unsure how to further optimize it's use. Below are the things I'm already doing to minimize it's impact:

I cache the texture's memory space at initial load using texStorage2D to keep it as contiguous as possible.

    let glTexture = gl.createTexture();
    let pixelData = new Uint8Array(4096*2048*3);
    pixelData.fill(255);

    gl.bindTexture(GL.TEXTURE_2D, glTexture);
    gl.texStorage2D(GL.TEXTURE_2D, 1, GL.RGB8, 4096, 2048);
    gl.texSubImage2D(GL.TEXTURE_2D, 0, 0, 0, 4096, 2048, GL.RGB, GL.RGB, pixelData);
    gl.generateMipmap(GL.TEXTURE_2D);

Then, during my render loop, both left and right eye-poses are processed for each object before moving on to the next object. This allows me to only need to call gl.bindTexture and gl.texSubImage2D once per object per frame. Additionally I also, skip populating shader program defines if the material for this entity is the same as the one for the previous entity, the video is paused, or still loading.

/* Main Render Loop Extract */

//Called each frame after pre-sorting entities
function DrawScene(glLayer, pose, scene){
    //Entities are pre-sorted for transparency blending, rendering opaque first, and transparent second.
    for (let ii = 0; ii < _opaqueEntities.length; ii++){

        //Only render if entity and it's parent chain are active
        if(_opaqueEntities[ii] && _opaqueEntities[ii].isActiveHeirachy){

            for (let i = 0; i < pose.views.length; i++) {
                _RenderEntityView(pose, i, _opaqueEntities[ii]);
            }
        }
    }

    for (let ii = 0; ii < _transparentEntities.length; ii++) {
    
        //Only render if entity and it's parent chain are active
        if(_transparentEntities[ii] && _transparentEntities[ii].isActiveHeirachy){
        
            for (let i = 0; i < pose.views.length; i++) {           
                _RenderEntityView(pose, i, _transparentEntities[ii]);
            }
        }
    }
}

let _programData;
function _RenderEntityView(pose, viewIdx, entity){
    //Calculates/manipualtes view matrix for entity for this view. (<0.1ms)
    //...
     
     //Store reference to make stack overflow lines shorter :-)
    _programData = entity.material.shaderProgram;

    _BindEntityBuffers(entity, _programData);//The buffers Thomas, mind the BUFFERS!!!

    
    gl.uniformMatrix4fv(
        _programData.uniformData.uProjectionMatrix,
        false,
        _view.projectionMatrix
    );
    gl.uniformMatrix4fv(
        _programData.uniformData.uModelViewMatrix,
        false,
        _modelViewMatrix
    );

    //Render all triangles that make up the object.
    gl.drawElements(GL.TRIANGLES, entity.tris.length, GL.UNSIGNED_SHORT, 0);    
}

let _attrName;
let _attrLoc;
let textureData;
function _BindEntityBuffers(entity, programData){
    gl.useProgram(programData.program);
    
    //Binds pre-defined shader atributes on an as needed basis
    for(_attrName in programData.attributeData){
        _attrLoc = programData.attributeData[_attrName];

        //Bind only if exists in shader
        if(_attrLoc.key >= 0){
            _BindShaderAttributes(_attrLoc.key, entity.attrBufferData[_attrName].buffer,
                entity.attrBufferData[_attrName].compCount);
        }
    }
    
    //Bind triangle index buffer
    gl.bindBuffer(GL.ELEMENT_ARRAY_BUFFER, entity.triBuffer);
    
    //If already in use, is instanced material so skip configuration.
    if(_materialInUse == entity.material){return;}
    _materialInUse = entity.material;
        
        
    //Use the material by applying it's specific uniforms
    //Apply base color
    gl.uniform4fv(programData.uniformData.uColor, entity.material.color);
    
    //If shader uses a difuse texture
    if(programData.uniformData.uDiffuseSampler){
        //Store reference to make stack overflow lines shorter :-)
        textureData = entity.material.diffuseTexture;
        
        gl.activeTexture(gl.TEXTURE0);
        
        //Use assigned texture
        gl.bindTexture(gl.TEXTURE_2D, textureData);
        
        //If this is a video, update the texture buffer using the current video's playback frame data
        if(textureData.type == TEXTURE_TYPE.VIDEO &&
            textureData.isLoaded &&
            !textureData.paused){
            
            //This accounts for 42% of all script execution time!!!
            gl.texSubImage2D(gl.TEXTURE_2D, textureData.level, 0, 0,
                textureData.width, textureData.height, textureData.internalFormat,
                textureData.srcType, textureData.video);
        }
        
        gl.uniform1i(programData.uniformData.uDiffuseSampler, 0);
    }   
}

function _BindShaderAttributes(attrKey, buffer, compCount, type=GL.FLOAT, normalize=false, stride=0, offset=0){
    gl.bindBuffer(GL.ARRAY_BUFFER, buffer);
    gl.vertexAttribPointer(attrKey, compCount, type, normalize, stride, offset);
    gl.enableVertexAttribArray(attrKey);
}

I've contemplated using pre-defined counters for all for loops to avoid the var i=0; allocation, but the gain from that seems hardly worth the effort.

Side Note, The source video is actually larger than 4K, but anything above 4K and FPS grinds to about 10-12.

Obligatory: The key functionality above is extracted from a larger WebGL rendering framework I wrote that itself runs pretty damn fast already. The reason I'm not 'just using' Three, AFrame, or other such common libraries is that they do not have an ATO from the DOD, whereas in-house developed code is ok.

Update 9/9/21: At some point when chrome updated from 90 to 93 the WebGL performance of texSubImage2D dropped dramatically, resulting in +100ms per frame execution regardless of CPU/GPU capability. Changing to use texImage2D now results in around 16ms per frame. In addition shifting from RGB to RGB565 offers up a few ms of performance while minimally sacrificing color.

I'd still love to hear from GL/WebGL experts as to what else I can do to improve performance.

I seems you use mipmapping (which tends to be expensive). Is it better without? — Jérôme Richard, Jun 11 '21 at 16:50
Turning it off for the video texture only via `gl.texParameteri(GL.TEXTURE_2D, GL.TEXTURE_MIN_FILTER, GL.LINEAR);` has no discernable effect on the time it takes to perform the `tesSubImage2D` (7ms on dev station, more on test station) — Reahreic, Jun 11 '21 at 17:24
Noting that the reason MIP doesn't appear to have an effect in this instance is due to the MIP being generated when the initial *empty* texture space is allocated and not when the texture is updated. — Reahreic, Jun 11 '21 at 18:36
@Reahreic Do not generate the mipmaps. Remove `gl.generateMipmap(GL.TEXTURE_2D);` — Rabbid76, Jun 11 '21 at 21:34

Improving the performance of Webgl2 texSubImage2D call with large texture

0 Answers0