1) I'd go with Boost as well.
2) Maybe. If there is any locking or I/O blocking in the tasks, then it is possible that a large number of threads may be required. If the tasks are CPU-bound, it's more difficult to say. If the tasks don't read, and especially write, a lot of data and so don't invalidate that much cache when run, a large number of threads seems to actually improve performance slightly - if just adding into a variable, 200 threads get slightly more work done than 8. In the more common case of CPU-bound tasks that use a lot of memory and so tend to dirty all caches, a lot of threads, (eg. 200), typically results in a throughput drop of 20-50% because of cache flushing.