Every process has its own virtual address space and the kernel has its own too,and kernel is a bunch of processes,does that mean every kernel process has its own virtual address space?
The kernel may have no special kernel processes. It may not even have special kernel threads. For instance, there were none in MSDOS, however primitive an OS it was. All it had was ISRs, lots of internal code for various things (e.g. file system drivers, memory manager, etc) and the system call API for applications.
The OS can have some of its functionality provided by (or, IOW, delegated to) user processes. That's the idea of micro kernels. These dedicated processes may have additional privileges compared to regular processes.
In this case the kernel is pretty much a set of subroutines callable in some way by user processes (DOS was just that, except there were no processes and there was one address space for everything). The kernel still has to be accessible by all and for that reason the memory, where it resides, is shared (for example, via page translation) across all address spaces. Every kernel has such a common/shared part.
In a 32-bit architecture system,every process has a 4GB virtual address space,and what's the size of kernel space?
It depends on the CPU and the kernel implementation. 32-bit Windows typically reserves 2GB for user and kernel portions of the address space. This may be overridden to 3GB for user and 1GB for kernel, if so is desired.
0x00000000-0xffffffff of a user space is occupied by kernel, but they are different spaces,how is this implemented?
It depends on the MMU of the CPU. With page tables on the x86 you can organize the entire virtual address space in such a way that only a part of it changes its mapping to physical memory (this is the user part) during a process/thread switch, while the other part remains the same (this is the shared kernel part).
Usually, there's just one virtual address space from the CPU standpoint. But its common to call its parts as separate, user and kernel virtual address spaces.
why kernel need to copy something into its own space?
How would it take input from processes, e.g. syscall parameters? But most importantly, what if it has to perform some long processing of input asynchronously, just taking the input, letting the caller continue and then signalling the caller when the work's done? The calling process may be free to modify or deallocate the data buffer that it's just passed to the kernel. The kernel may not be very "happy" to observe the data it's working with change or disappear. If there are multiple threads in the process, this problem can occur even with synchronous calls, because another thread can alter the buffer, while the kernel is working with it.
There can be other reasons for copying data to or keeping it in the kernel portion of the address space.