I pretty much develop in an exclusively multi-threaded, high performance world so here's the general practice I use.
Design- the best optimization is a better algorithm:
1) Break you functions into LOGICALLY separable pieces. This means that a call does "A" and ONLY "A"- not A then B then C...
2) NO SIDE EFFECTS: Abolish all nakedly global variables, static or not. If you cannot fully abolish side effects, isolate them to a few locations (concentrate them in the code).
3) Make as many isolated components RE-ENTRANT as possible. This means they're stateless- they take all their inputs as constants and only manipulate DECLARED, logically constant parameters to produce the output. Pass-by-value instead of reference wherever you can.
4) If you have state, make a clear separation between stateless sub-assemblies and the actual state machine. Ideally the state machine will be a single function or class manipulating stateless components.
Debugging:
Threading bugs tend to come in 2 broad flavors- races and deadlocks. As a rule, deadlocks are much more deterministic.
1) Do you see data corruption?: YES => Probably a race.
2) Does the bug arise on EVERY run or just some runs?: YES => Likely a deadlock (races are generally non-deterministic).
3) Does the process ever hang?: YES => There's a deadlock somewhere. If it only hangs sometimes, you probably have a race too.
Breakpoints often act much like synchronization primitives THEMSELVES in the code, because they're logically similar- they force execution to stall in the current context until some other context (you) sends a signal to resume. This means that you should view any breakpoints you have in code as altering its mufti-threaded behavior, and breakpoints WILL affect race conditions but (in general) not deadlocks.
As a rule, this means you should remove all breakpoints, identify the type of bug, THEN reintroduce them to try and fix it. Otherwise, they simply distort things even more.