Based on your comm settings, each transmitted char/byte should be:
1 start bit
8 data bit
0 parity bits
2 stop bits (this should be 2 -> spec says 11 bits/char, but I have seen implementations that use 8N1)
-------------------------------
total: 11 bits/byte
1) Request
address: 1 byte
function code: 1 byte
starting address: 2 bytes
quantity of registers: 2 bytes
crc: 2 bytes
-------------------------------
total: 8 bytes
Min time: 1s/9600bits * (8 bytes) * (11 bits/byte) = 9.2 ms
2) Silent Interval Between Frames
Min time: 1s/9600bits * (3.5 bytes) * (11 bits/byte) = 4.0 ms
3) Response
address: 1 byte
function code: 1 byte
byte count: 2 bytes
register values: 24 bytes (2*12)
crc: 2 bytes
-------------------------------
total: 30 bytes
Min Time = 1s/9600bits * (30 bytes) * (11 bits/byte) = 34.3 ms
Thus, the theoretical minimum round trip time is: 9.2 + 4.0 + 34.3 = 47.5 ms
So it looks like there is potentially room for improvement, but hard to know exactly
where your delays are.
In addition to what others have suggested, I usually put a scope on the comm lines to see if the tx and rx frames are close to the theoretical times. You should also be able to tell about how long it takes for the server to respond by looking at the time between the end of the tx and start of the rx frames. If you can somehow trigger the scope just before you start sending the request, you can also measure the time it takes for the client to generate and start transmitting the request frame.