I imagine that you have a workhorse function that does the job on each thread. You could define a thread-local variable that you increment each time this function is called. This variable will show, for the corresponding thread, how much work it has done.
Then, the program thread will check and compare those values before dispatching the task at hand to the selected thread (the one with minimal work-counter).
Another way would be to use a measure of time spent in work/time spent in idle for each thread. Work-time is considered between the beginning and the end of the thread's workhorse function, while idle time is the other one (you could measure all of those at the beginning and the ending of the workforce function).
Anyway, the idea is that basically each work-thread will measure how much occupied it is. This is somewhat imprecise, but more advanced solution will basically involve re-creating a threading library/framework.