Everybody that has intermediate experience with 2d renderers knows that a sprite batcher has data inside of graphics API specific buffers that needs to be updated, and we always look for the fastest way to update it. Now I've run into a dilemma - For Metal and Swift, what is the smartest thing to update, and what is the smartest way of doing it? To be more specific, shall I update vertices before sending them to the GPU (do the vertex and tex coord transformations on the CPU), or create the transform matrix, create the tex coord parameters, and send it in one instanced uniforms buffer (do the vertex and tex coord transformations on the GPU). The way I'm doing it currently involves instanced rendering and a giant uniforms buffer which is aligned to 8 bytes.
Static Data
static let spritesPerBatch: Int = 1024
static var spritesData: [Float] = [Float](count: spritesPerBatch * BufferConstants.SIZE_OF_SPRITE_INSTANCE_UNIFORMS / sizeof(Float), repeatedValue: 0.0)
Queueing Sprite Data
Method: SpriteBatch.begin()
spritesInBatch = 0
Method: SpriteBatch.submit(sprite)
let offset: Int = spritesInBatch * BufferConstants.SIZE_OF_SPRITE_INSTANCE_UNIFORMS / sizeof(Float)
// transform matrix (3x2)
spritesData[offset + 0] = wsx * cosMetaRot * xOrtho
spritesData[offset + 1] = wsx * sinMetaRot * yOrtho
spritesData[offset + 2] = -hsy * sinMetaRot * xOrtho
spritesData[offset + 3] = hsy * cosMetaRot * yOrtho
spritesData[offset + 4] = (tx * cosNegCameraRotation - ty * sinNegCameraRotation) * xOrtho
spritesData[offset + 5] = (tx * sinNegCameraRotation + ty * cosNegCameraRotation) * yOrtho
// tex coords and lengths
spritesData[offset + 6] = sprite.getU()
spritesData[offset + 7] = sprite.getV()
spritesData[offset + 8] = sprite.getUVW()
spritesData[offset + 9] = sprite.getUVH()
// which texture to use out of the 16 that could be bound
spritesData[offset + 10] = Float(targetTextureIDIndex)
spritesInBatch++
Copying sprite data into the uniforms buffer
Method: SpriteBatch.end()
instancedUniformsBuffer = device.newBufferWithLength(length: spritesPerBatch * BufferConstants.SIZE_OF_SPRITE_INSTANCE_UNIFORMS, options: MTLResourceOptions.CPUCacheModeWriteCombined)
instancedUniformsPointer = instancedUniformsBuffer.contents()
memcpy(instancedUniformsPointer, spritesData, instancedUniformsBuffer.length)
Renderer.renderSpriteBatch()
Sprite batch render method
Method: Renderer.renderSpriteBatch()
Shaders.setShaderProgram(Shaders.SPRITE)
let textureIDs: [TextureID] = SpriteBatch.getTextureIDs()
for (var i: Int = 0; i < textureIDs.count; i++) {
renderEncoder.setFragmentTexture(TextureManager.getTexture(textureIDs[i]).texture, atIndex: i)
}
let instancedUniformsBuffer: MTLBuffer = SpriteBatch.getInstancedUniformsBuffer().buffer
renderEncoder.setVertexBuffer(VertexBuffers.SPRITE.buffer, offset: 0, atIndex: 0)
renderEncoder.setVertexBuffer(instancedUniformsBuffer, offset: 0, atIndex: 1)
renderEncoder.drawIndexedPrimitives(MTLPrimitiveType.Triangle, indexCount: BufferConstants.SPRITE_INDEX_COUNT, indexType: MTLIndexType.UInt16, indexBuffer: IndexBuffers.SPRITE.buffer, indexBufferOffset: 0, instanceCount: SpriteBatch.getSpritesInBatch())
I currently am able to get about 1400 sprites sized at 32x64 with 8 separate textures at 60 fps on an iPhone 5s. I am mostly satisfied with this and will be able to finish my iOS game with that number. However, I want to push the boundary so that I can use better effects in the game. To reiterate the question in case I haven't made it clear just yet, I'm wondering two major things that are specific to PERFORMANCE.
- Would it be a better idea to have a larger vertex buffer (as opposed to my current method: sharing one vertex and index buffer for ALL sprites) where I am setting the position and texture coordinates of each vertex using memory copies on the CPU side? This would also mean NOT using instanced draw calls.
- If not, is there a faster way to prepare and copy the sprite data?
Thanks and sorry for the super long post! :)