Real-time Communication's Golden Keys: ICE-STUN-TURN, DTLS-SRTP
Research the WebRTC's fundamental principles from Discord Website's voice chat
This article is mainly talking about WebRTC and some of its underlying implementation of network. Click here to skip the basic knowledge part.
Since Discord’s architecture differs from the different platforms, the following
Discordreffers toDiscord for Web
Introduction
Cone NAT
This part only compares the two main types of NAT.
If your device is behind a Cone NAT, it will obtain a stable external address‑port mapping that does not depend on the remote peer you contact. Once this mapping is created, external hosts can reach your device through that same mapping as long as they satisfy the NAT’s specific permission rules, such as matching the allowed IP or IP‑and‑port combinations. This predictable behavior makes Cone NATs more friendly to peer‑to‑peer protocols and NAT traversal techniques.
Symmetric NAT
This part only compares the two main types of NAT.
If your device is behind a Symmetric NAT, the combination of one internal IP address and a destination IP address and port is mapped to a single unique external source IP address and port; if the same internal host sends a packet even with the same source address and port but to a different destination, a different mapping is used, which means only an external host that receives a packet from an internal host can send a packet back.
ICE(Interactive Connectivity Establishment)
ICE is a framework protocol, whose job is to make sure two peers can establish a reliable connection even if they’re behind NATs or firewalls (Simply the two points are your device and the destination device/server).
What ICE does is:
- Collects all possible addresses the peer could use, such as your local IP address, your public IP address(via STUN) and so on.
- Each peer sends its ip candidate list to the other side.
- Runs STUN binding requests between candidate pairs to test if they(or which ones) can actually communicate.
- Selects one working pair that gives the best connection. Once the best pair is chosen, ICE uses only that path for the session. The other candidates are discarded.
Two types of ICE:
- Full-ICE: Both peers need to conduct connectivity checks.
- Lite-ICE: Lite agents only use host candidates (that means, they have public IP) and do not generate connectivity checks or run state machines.
STUN(Session Traversal Utilities for NAT)
STUN is a protocol that allows peers to discover their public IP address and port number. Also, it’s a utility protocol that ICE (and other NAT traversal systems) rely on. For many servers such as Discord’s voice chat server, there is a publicly reachable server on the Internet that responds to STUN requests.
How STUN works:
- If STUN servers are available, client sends a UDP packet to a STUN server on the Internet. Because STUN reveals your public IP, it can bypass proxies if UDP isn’t properly handled.
- The STUN server responds with the source IP and port it sees. Then it can be the candidate as
srflxin ICE. - STUN can also be used between peers to test if a the ip is viable.
- The STUN server responds with the source IP and port it sees. Then it can be the candidate as
- If there is no STUN server, then peer and peer will directly confirm the IP via sending STUN Binding Request (ICE check) to the other peer,
prflxgenerated while ICE succeeds.
Notes:
- If two peers are behind NATs, a STUN server is required, or fallback to TURN.
- If the app relies solely on STUN server, any peer behind symmetric NAT cannot be reliably reached by the other side, because the mapping seen by the STUN server differs from the one seen during actual peer communication.
- If one peer is behind a symmetric NAT, but the other enables ICE Lite, a
prflxis able to be generated.
TURN(Traversal Using Relays around NAT)
TURN is a protocol that allows peers to establish a relay server to bypass NATs. It’s a server-based protocol, which means that the server is responsible for relaying data between peers.
Unlike STUN, a server is a must for TURN. Here is how it works:
- The client sends an Allocate request to the TURN server, which allocates a relay address and returns it in an Allocation Successful response. Then
relaycandidate is generated. - The client sends a CreatePermissions request to establish a validation system for peer-to-server traffic.
- The client chooses either the Send mechanism or ChannelBind to transmit data.
- The TURN server relays all traffic between client and peer using UDP datagrams, validating permissions before forwarding peer responses to the client.
WebRTC(Web Real-time communication)
WebRTC is a set of open standards and APIs that allow browsers and apps to establish direct, peer-to-peer communication for audio, video, and data, without needing plugins or external software. Basically it is a collection of protocols (ICE, STUN, TURN and so on) and other useful tools wrapped in browser APIs.
DTLS(Datagram Transport Layer Security)
It is just like TLS, but DTLS is used for datagrams (UDP). Video/audio transmission is done using UDP, but TLS depends on TCP’s reliability, which means TLS could not adapt to packet loss. So DTLS comes out.
How DTLS works (handshake):
- Client sends
ClientHelloanduse_srtpextension with no cookies. - Server received and sends
HelloVerifyRequestwith aopaque cookie. - Client received and sends
ClientHelloanduse_srtpextension one more time with this cookie. - Server sends ServerHello + Certificate + ServerKeyExchange + ServerHelloDone.
- Client sends ClientKeyExchange + ChangeCipherSpec + Finished.
- Server sends ChangeCipherSpec + Finished.
SRTP(Secure Real-time Transport Protocol)
SRTP is a profile for Real-time Transport Protocol (RTP) intended to provide encryption, message authentication and integrity, and replay attack protection to the RTP data in both unicast and multicast applications.
After DTLS handshake completion:
- Both sides use TLS Exporter (with label “EXTRACTOR-dtls_srtp”) to derive SRTP master keys and salt from the Master Secret.
- Initialize SRTP / SRTCP session with the exported keys.
- Start sending encrypted SRTP packets (for audio and video media) and SRTCP packets (for control).
Behaviour of Discord’s Voice Chat
According to your network’s situation and browser, there are different kinds of route to establish a voice connection. If you opened the Web developer console, you can see their connection logs like this:

The Common Route
Awaiting Endpoint[0] -> Connecting[1] -> Authenticating[2] -> RTC Connecting[3] -> Checking Route[4] -> Voice Connected[5]
Step Connecting
An RTC server has been allocated for you and Discord is attempting to connect to it. Get ready to roll out.
This step will attempt to connect to wss://xxxxx.discord.media:xxxx/, an RTC server, or a signaling server (WebSocket). It will be used to the next step.
Step Authenticating
Discord has connected to you real-time communication server and is securing the connection.
This step exchanges identification information to the server from the step [1] to secure the connection.
Step RTC Connecting
Locked and loaded! Discord has established a secure connection to your real-time communication server and is attempting to send data.
From this step on, it becomes a little bit complicated. The following steps are almost simultaneously executed:
- Create RTC peer connection.
- Update video quality.
- Generate and exchange SDP(Session Description Protocol) Offer.
- Get the
media server(will be used in the next step). - and so on.
From the time you clicked Join Voice to here, all the major packets use TCP. Then the rest of the steps actually start the important part.
Step Checking Route
Shields up! Discord has established a secure connection to your real-time communication server and is attempting to send data. If your browser connection is stuck in this step, check out this swanky article to help resolve the problem.
In this step, things in introduction all came. Discord uses their own SFU (Selective Forwarding Unit), so ICE-Lite is required, which means the remote candidate is the host of the media server referred in the previous step.
For clients, firstly the WebRTC will call ICE gathering to generate candidates. Then your device will send a lot of STUN binding requests to the media server. If the media server received and sends back a binding request successful, the voice connection is established. Start to talk!
If there is no packet received from the media server, it failed. For firefox users, you will see this in the console:
WebRTC: ICE failed, add a STUN server and see about:webrtc for more details
Check out the about:webrtc page, you will find that — no relay candidate — Discord does not use TURN server but SFU. Which means, if STUN binding failed, ICE failed. One more thing is that, they do not use STUN server so srflx here is also unavailable.
Discord themselves also posted a guide to help you resolve the problem. Here just explains what literally happened:
btw this tool is awesome!
- If there is no STUN packet sent from your device to the media server, it might be a firewall issue.
- If there is no STUN packet received but only sent, Discord’s backend issue
(Rare! Or the whole community will explode!)or you are using a VPN that encountered some issues, refferred later.
Notes:
- No matter what NAT you behind, you will be able to see the prflx candidate if there are no previous issues you meet, and you could establish a voice connection.
- If you are using a VPN, and you found STUN packets sent but no received, there are two possible reasons:
- Your VPN provider has directly blocked the UDP packets.
- Your VPN provider has blocked the port Discord uses. Unlike most of STUN servers
(btw Discord does not use any), they are using a higher port instead of 3478. Your VPN provider might have blocked the port to protect them from DDoS attack. Check it with some tools to ensure!
- If you are using a system proxy, that does not matter because WebRTC will bypass it.
The Extra Route
One thing in introduction did not mention: DTLS-SRTP. But after my deep search, one mystery thing I could not even figure out is, chromium based browsers would go this route, while gecko based(Firefox) would not, both console and UI shows the difference.
Checking Route[4] -> DTLS Connecting[4.5] -> Voice Connected[5]
The console shows:
# Chrome
[UnifiedConnection(default)] Connection state change: ICE_CHECKING => DTLS_CONNECTING
...
[UnifiedConnection(default)] Connection state change: DTLS_CONNECTING => CONNECTED
# Firefox
[UnifiedConnection(default)] Connection state change: ICE_CHECKING => CONNECTED
But both of the situations are the same — I caught the DTLS and SRTP packets (btw it is the browser’s design guideline).
Step DTLS Connecting
Shields up! Discord has established a secure connection to your real-time communication server and is attempting to send data. If your browser connection is stuck in this step, check out this swanky article to help resolve the problem.
The detailed steps are in introduction, this step is mainly to make sure the media transmission is secure.
Off Topic
All these research is finished in GNU/Linux. Here is a list of tools I used, they are all helpful:
- termshark: An awesome tool to help you catch and analyze packets.
(btw it often stuck when I decided to close it, looking forward to be fixed XD) - termjail: A tool isolates network traffic to only the current terminal shell and its children and drops everything else.
(I write this for myself to use) - stunserver: A simple STUN testing tool.
Personal experience and knowledge - Issues are welcome