From you description, it seems you're writing to ThreadParameter (or some other data structure) BEFORE starting any child threads, and you will never write to ThreadParameter again... it exists to be read as needed, but never changed again after its initialization; is that correct? If so, then there's no need whatsoever to employ any thread synchronization system calls (or processor/compiler primitives) every time a child thread wants to read the data, or even the first time for that matter.
The treatment of volatile is somewhat compiler-specific; I know that at least with Diab for PowerPC, there is a compiler option regarding the treatment of volatile: either use the PowerPC EIEIO (or MBAR) instruction after every read/write to a variable, or don't use it... this is in addition to prohibiting compiler optimizations associated with the variable. (EIEIO/MBAR is PowerPC's instruction for prohibiting reordering of I/O by the processor itself; i.e, all I/O from before the instruction must complete before any I/O after the instruction).
From a correctness/safety standpoint, it doesn't hurt to declare it as volatile. But from a pragmatic standpoint, if you initialize ThreadParameter far enough ahead of StartThread(), declaring it volatile shouldn't really be necessary (and not doing so would speed up all subsequent accesses of it). Pretty much any substantial function call (say, perhaps to printf() or cout, or any system call, etc) would issue orders of magnitude more instructions than necessary to ensure there's no way the processor wouldn't have long ago handled the write to ThreadParameter before your call to StartThread(). Realistically, StartThread() itself almost certainly will execute enough instructions before the thread in question actually starts. So I'm suggesting that you don't really need to declare it volatile, probably not even if you initialize it immediately before calling StartThread().
Now as to your question regarding what would happen if the page containing that variable were already loaded into the cache of both processors before the processor running the main thread performs the initialization: If you're using a commonly available general purpose platform with like-kind CPUs, the hardware should already be in place to handle the cache coherency for you. The place you get into trouble with cache coherency on general purpose platforms, whether or not they're multiprocessor, is when your processor has separate instruction & data caches and you write self-modifying code: The instructions written to memory are indistinguishable from data, so the CPU doesn't invalidate those locations in the instruction cache, so there may be stale instructions in the instruction cache unless you subsequently invalidate those locations in the instruction cache (either issuing your own processor-specific assembly instructions, which you might not be allowed to do depending on your OS and your thread's privilege level, or else issuing the appropriate cache-invalidate system call for your OS). But what you're describing isn't self-modifying code, so you should be safe in that regard.
Your question 1 asks how to make this safe across ALL processor architectures. Well, as I discussed above, you should be safe if you're using like-kind processors whose data busses are properly bridged. General-purpose processors designed for multiprocessor interconnection have bus snoop protocols to detect writes to shared memory... as long as your threading library properly configures the shared memory region. If you're working in an embedded system, you may have to configure that yourself in your BSP... for PowerPC, you need to look at the WIMG bits in your MMU/BAT configuration; I'm unfamiliar with other architectures to give you pointers on those. BUT.... If your system is homebrew or if your processors are not like-kind, you may not be able to count on the two processors being able to snoop each others' writes; check with your hardware folks for advice.