Besides the marshaling layer, which is responsible for converting parameters for you and figuring out calling conventions, the runtime needs to do a few other things to keep internal state consistent.
The security context needs to be checked, to make sure the calling code is allowed to access native methods. The current managed stack frame needs to be saved, so that the runtime can do a stack walk back for things like debugging and exception handling (not to mention native code that calls into a managed callback). Internal bits of state need to be set to indicate that we're currently running native code.
Additionally, registers may need to be saved, depending on what needs to be tracked and which are guaranteed to be restored by the calling convention. GC roots that are in registers (locals) might need to be marked in some way so that they don't get garbage collected during the native method.
So mainly it's stack handling and type marshaling, with some security stuff thrown in. Though it's not a huge amount of stuff, it will represent a significant barrier against calling smaller native methods. For example, trying to P/Invoke into an optimized math library rarely results in a performance win, since the overhead is enough to negate any of the potential benefits. Some performance profiling results are discussed here.