As with just about any question like this, whether or not what you are doing is the "best practice" depends. Your queries to gather data are not the only component to consider for performance. Those other factors to consider are:
- as others have hinted - your hardware (destination, source, network inbetween, etc...)
- your data (overall size, depth, width)
- your schedule (is this the only thing running? Must it be done within a given window? etc...)
For example, if the destination or source or network hardware are used heavily by others, you might not want to thread at all to minimize the amount of concurrent use. Conversely, if the hardware is dedicated to scheduled batch activity, then you likely do want to split the data movement into threads and use as much of the hardware as possible to reduce the overall time spent doing this task.
Short of being able to know all that info and make the best decision you COULD also just try incrementally increasing the number of threads. First, assess what "fast" and "slow" mean for your environment. When you run just one thread what is the transfer rate (if you don't already know this from knowing the hardware). Then add another thread, reassess transfer rate. Then another and so on until and if you see a decrease in overall time to move the data. Increase until you know you are using the optimal number of threads... for your hardware, on your network, within your schedule for your data.
You wont get a definitive answer here (or at least, you shouldn't :) ) on what is 'best' for you. Too much to share and consider.