It will depend on
- Your processor (specifically hyperthreading)
- How much info has to be copied to/from the different images
- If you're implementing this over multiple boxes (LAN)
For 1) hyperthreading helps. I know my machine does it so I typically have twice as many workers are cores and my code completes in about 85% of the time compared to if I matched the number of workers with cores. It won't improve more than that.
2) If you're not forking, using sockets for instance, you're working as if you're in a distributed memory paradigm, which means creating one copy in memory for every worker. This can be a non-trivial amount of time. Also, multiple images in the same machine may take up a lot of space, depending on what you're working on. I often match the number of workers with number because doubling workers will make me run out of memory.
This is compounded by 3) network speeds over multiple workstations. Locally between machines our switch will transfer things at about 20 mbytes/second which is 10x faster than my internet download speeds at home, but is a snail's pace compared to making copies in the same box.
You might consider increasing R's nice value so that the python has priority when it needs to do something.