I'm working on a project where two clients can send files to each other via web sockets (using Socket.IO). Each chunk is encrypted with AES.
Currently, the clients connect to the server, they each generate an RSA public/private key pair on their devices, they then announce their public keys to the server which sends them to the other client, and this gets stored by said client. Before data is sent, it is encrypted with AES using a random key and a random IV, and the AES key is then encrypted using the other client's public key. The data is sent across, the other client then decrypts the AES key using their RSA private key, and finally decrypts the content using the decrypted AES key and saves it to a file on their disk.
The issue is that the server could easily just replace one client's public key with its own, and steal the data. The only solution I can think of is for the clients to contact one another and manually verify their public keys... I'm not sure how I'd go about automating this process. Services that provide E2EE seem to generate a matching code on each device, but I'm having trouble finding any information about how this is actually implemented, like how would two devices generate matching codes without talking to a server or each other in between, and if they do, then the server knows the code anyway right?
I've considered using WebRTC to send the public key from one client to the other without having the data go through the server, but I'd appreciate alternative approaches.