Even on local disk, there is some per-file overhead, which I believe is mostly due to the expense of opening a file: to open an existing file, Windows has to parse the path, find the corresponding entries in each level of the directory tree, look up the file in the MFT, and check the ACL. To create a new file, Windows has to parse the path, find the corresponding entries in each level of the directory tree, check the directory ACL, and add the file to the MFT and the top-level directory entry.
If you only have one thread, you have to open the source file, open the destination file, copy the data, and close the files, and only then can you move on to the next one. That means leaving the I/O subsystem idle part of the time. If you have multiple threads you can be opening files at the same time that you're copying data; ideally, you're keeping the I/O system busy the entire time.
The overhead isn't all that noticeable on a single file, but if you have a lot of small files it adds up and the time saved can be significant.