I'll admit, I'm new to multithreading and I was hoping to dabble in it with my C++ project first but I've hit a snag in my Godot game project where rendering the terrain gave me a small lag spike every time new terrain was generated so I wanted to move it to a separate thread. The only problem is that I can't find good resources on Godot multithreading so I was simply going off the documentation. I practically copied the same design in the documentation but it ended up making my game slower and even lags the main thread, not just the generation thread.
I've done a lot of my own research and I know SO is really keen on that or they kick you out so I want to list it here:
- Godot Docs, only teaches you about starting up a thread, mutexes, and semaphores.
- From what I understand mutexes lock a resource so only that thread can access it and only that thread can unlock it. From tests on my machine constant locks and unlocks don't seem to cause much overhead.
- Semaphores from what I understand are a tool to signal some thread from a different thread, whereas mutexes only can unlock and lock from the same thread, one thread can signal a semaphore while another thread waits for that signal. This too doesn't seem to cause much overhead
- Doing some practical experiments, it seems that if I get a handle on a chunk and call its render method, the method doesn't happen on that thread which I assume is the culprit however if that's the case I don't understand why the rendering could be SLOWER than doing it all on the main thread unless there's an overhead to calling a function on an object that was created on the main thread however that confuses me even more as isn't all memory shared between threads so why would it need to do something extra to call a function?
- Using "call_deferred" seems to make the separate thread slightly faster but heavily slows the main thread. And tbh I'm not completely knowledgeable on call_deferred it seems to call the function during idle time, I experimented with it because of my next research point which is
- Thread Safe APIs, after reading this I understand that interacting with the active tree isn't thread-safe, which means using call_deferred is preferred to interact with it during idle time. It is stated that it is preferred to construct one scene on a separate thread, then use call_deferred to do only one call to add_child. This seems to help get around that Thread Safety issue so that's what I did
That's the best research I could do and I hope it shows I really have tried what I could. It's absolutely not the best that's possible I'm sure, it's just that's the extent of my expertise in research which is why I came here (Y'all seem to have expertise far beyond what I can imagine having haha)
However, taking what I understood from all of this I decided to create a system where once an array of indices to positions to generate is written to, it posts a signal to a semaphore which will start the other thread's generation algorithm. The thread is in a while loop where at the start it has a semaphore.wait() to wait for that signal that the array is written to and ready. It goes through the indices and calls the render function for the chunks around that point (I didn't mention the array holds a Vector2 of the chunk position to render around) For this case the only point right now is the players position so the array is always 1 but that's just for now. The render function of the chunks builds a Node2D with all the tiles before doing only one call to add_child through call_deferred to get around the Thread Safety issues. One issue is that there will be one call_deferred for each chunk however when I tried to fix that it wouldn't work at all which was also weird.
So here I am with the code:
GameMap Code (Simplified)
# Made up of MapChunks
var map = {}
var chunk_loaders = [Vector2(0,0)]
var render_distance = 7 # Must be Odd
var chunk_tick_dist = 19 # Must be Odd
var noise_list = {"map_noise" : null, "foliage_noise" : null}
var chunk_gen_thread
var chunk_gen_thread_exit = true
var mutex
var semaphore
var indices_to_generate = []
onready var player = get_node("../Player")
func _ready():
mutex = Mutex.new()
semaphore = Semaphore.new()
chunk_gen_thread = Thread.new()
chunk_gen_thread.start(self, "chunk_generation_thread")
generate_noise()
regen_chunks(get_chunk_from_world(player.position.x, player.position.y), 0)
func _exit_tree():
mutex.lock()
chunk_gen_thread_exit = true
mutex.unlock()
semaphore.post()
chunk_gen_thread.wait_to_finish()
func chunk_generation_thread(userData):
while true:
semaphore.wait() # Wait for chunk gen signal
# Protect run loop with mutex
mutex.lock()
var should_exit = !chunk_gen_thread_exit
mutex.unlock()
if should_exit:
break
# Regen Chunks
mutex.lock()
for i in indices_to_generate:
var lc_pos = Vector2(chunk_loaders[i].x - floor(render_distance/2), chunk_loaders[i].y - floor(render_distance/2))
var upper_lc = lc_pos - Vector2(1, 1)
for x in render_distance:
for y in render_distance:
var chunk_pos = Vector2(lc_pos.x+x, lc_pos.y+y)
var chunk = retrieve_chunk(chunk_pos.x, chunk_pos.y)
chunk.rerender_chunk()
for x in render_distance+2:
for y in render_distance+2:
if x != 0 and x != render_distance+1 and y != 0 and y != render_distance+1:
continue
var chunk = Vector2(upper_lc.x+x, upper_lc.y+y)
var unrender_chunk = retrieve_chunk(chunk.x, chunk.y)
unrender_chunk.unrender()
mutex.unlock()
func regen_chunks(chunk_position, chunk_loader_index):
mutex.lock()
if chunk_loader_index >= chunk_loaders.size():
chunk_loaders.append(Vector2(0,0))
chunk_loaders[chunk_loader_index] = chunk_position
indices_to_generate = [chunk_loader_index]
mutex.unlock()
semaphore.post()
func retrieve_chunk(x, y):
mutex.lock()
if !map.has(Vector2(x, y)):
create_chunk(x, y)
mutex.unlock()
return map[Vector2(x, y)]
func create_chunk(x, y):
var new_chunk = MapChunk.new()
add_child(new_chunk)
new_chunk.generate_chunk(x, y)
MapChunk Code (Simplified) var thread_scene
onready var game_map = get_parent()
func _ready():
thread_scene = Node2D.new()
func generate_chunk(x, y):
chunk_position = Vector2(x, y)
rerender_chunk()
func rerender_chunk():
if !un_rendered:
return
un_rendered = false
lc_position.x = chunk_position.x*(CHUNK_WIDTH)
lc_position.y = chunk_position.y*(CHUNK_HEIGHT)
thread_scene.queue_free()
thread_scene = Node2D.new()
chunk_map.resize(CHUNK_WIDTH)
for x in CHUNK_WIDTH:
chunk_map[x] = []
chunk_map[x].resize(CHUNK_HEIGHT)
for y in CHUNK_HEIGHT:
var cell_value = game_map.get_noise_value("map_noise", lc_position.x+x, lc_position.y+y)
assign_ground_cell(cell_value, x, y)
self.call_deferred("add_child", thread_scene)
func unrender():
if un_rendered:
return
un_rendered = true
for x in CHUNK_WIDTH:
for y in CHUNK_HEIGHT:
if chunk_map[x][y].occupying_tile != null:
chunk_map[x][y].occupying_tile.call_deferred("queue_free")
chunk_map[x][y].call_deferred("queue_free")
func assign_ground_cell(cell_value, x, y):
if cell_value < 0.4:
chunk_map[x][y] = game_map.create_tile("GRASS", lc_position.x+x, lc_position.y+y)
generate_grass_foliage(x, y)
elif cell_value < 0.5:
chunk_map[x][y] = game_map.create_tile("SAND", lc_position.x+x, lc_position.y+y)
else:
chunk_map[x][y] = game_map.create_tile("WATER", lc_position.x+x, lc_position.y+y)
thread_scene.add_child(chunk_map[x][y])
func generate_grass_foliage(x, y):
var cell_value = game_map.get_noise_value("foliage_noise", lc_position.x+x, lc_position.y+y)
if cell_value >= 0.4:
chunk_map[x][y].occupying_tile = game_map.create_tile("TREE", lc_position.x+x, lc_position.y+y)
chunk_map[x][y].occupying_tile.parent_tile = chunk_map[x][y]
chunk_map[x][y].occupying_tile.z_index = 3
elif cell_value >= 0.2 and cell_value < 0.4:
chunk_map[x][y].occupying_tile = game_map.create_tile("GRASS_BLADE", lc_position.x+x, lc_position.y+y)
chunk_map[x][y].occupying_tile.parent_tile = chunk_map[x][y]
chunk_map[x][y].occupying_tile.z_index = 1
if chunk_map[x][y].occupying_tile != null:
thread_scene.add_child(chunk_map[x][y].occupying_tile)
KEEP IN MIND
All of this code works fine if it's all on the main thread!! There is nothing wrong with the chunk generation code itself! It works completely fine if I remove the thread.start thing from the ready function. It all works except there's like a 0.5-second lag spike every time it's called that I'm trying to get rid of. I am almost 89% sure this should purely be a thread problem. (I'm sure I could improve the chunk gen algorithm more but I also really want to understand threads)