I'm a little confused about who (PBX vs. phones) is doing what in our IP telephone setup.
We have our phone system hosted externally. This means that we've got physical IP telephones installed at our location, but the PBX is located elsewhere.We've had various trouble with this solution, and it annoys me that I am not able to troubleshoot the various issues properly due to the fact that I don't understand how it all works together. I've been reading up on how SIP and RTP works, but there's a couple of things I still don't entirely get. My question relates to both SIP and RTP, but also touches a bit upon NAT.
Our scenario is simple. The PBX is reachable on a public IP somewhere on the Internet. We are behind a firewall (NAT).
First of all, it is my understanding that the SIP packets do not carry audio. They are simply there to make sure that sessions (as in Session Initiation Protocol) gets established between phones and the PBX - such as when a call is about to be established. These packets are often UDP, and often "travels" on port 5060.
Although not specific to SIP; when the SIP UDP packets go through NAT then NAT "translates" source port into something different for outgoing packets to ensure that NAT can map responses back to the right phone behind the NAT.
Now, if someone wants to reach one of our phones, they dial the number and eventually hit the PBX. It is my understanding that the PBX then sends an INVITE (SIP UDP packet) to the phone in question. That packet contains, among other things, an IP address and a port number which the phone should connect to in order to establish an RTP session (this is the actual audio data).
Question 1: I've been told that phones must send regular keep alive requests to the PBX in order to ensure that the NAT does not expire UDP sessions. This is important because the PBX initiates the INVITE requests for calls, and the NAT must not expire the UDP sessions in order to map any given INVITE request to the correct phone. Is this right? Will any given INVITE request from the PBX use the source port (which would have been translated by NAT) provided in the keep alive requests from the phones?
Question 2 : Is it correct that the phones are the ones which acts upon any given INVITE SIP request and connects to the PBX? This means that I should not really worry about NAT here, as the phone send the first RTP packet and "punch a hole" in the NAT.
Question 3: How does it work "the other way around"? That is, if I want to call someone from one of our phones behind the NAT, does my phone send an INVITE SIP request to the PBX? That does not make sense to me, because the PBX not would be able to establish an RTP session with my phone which is behind NAT.