select /*+ parallel (5) */
Seems like odd number for degree of parallelism. Well, obvious 5 is an odd number, and that is strange. The DoPs ought to be a power multiple of two (see below for more).
Anyway, do you have a reason for using parallel query? Do you have at least five spare processors? If not, there is a good chance the overhead of managing the PQ slaves is at least contributing to the poor performance.
Why should DOP = n*2? There is an established heuristic based on Queuing Theory that running more than two batch jobs simultaneously leads to degraded performance. Find out more. (I think queuing theory actually recommends a figure of 1.8, but as database jobs are often bound by I/O or disk we can usually get away with 2.)
I originally said "power of 2" but that's mainly because multi-core servers tend to have a number of CPUs which is a power of 2, but multiple of 2 is more accurate, because some boxes have 12 CPUs or some other number.
Now, if we have a 64 core box, a DOP of 5 or 37 is fine, because we have enough CPUs to run that many threads simultaneously. But if we have got a small quadcore box, only 2, 4 or 8 makes sense, because those are the only values which will ensure an even distribution of work across all four processors. Running five threads on a quadcore box means one CPU will be doing a lot more work than the other three; there is a possibility that it will take longer to finish, leaving the other three slaves waiting. So DOP=5
can actually lead to a greater elapsed time than DOP=4
.
DOP=n*2
is only a rule of thumb, and not set in stone. However, it is based on sound reasoning, and we should know why we're doing something different. Obviously, we should have conducted some experiments to confirm that we have chosen the right DOP (whatever value we settle on).