Posts

WebRTC Architecture that keeps working when us-east-1 goes down

Image
This week AWS had an outage in the northern Virginia region , probably the most commonly used region and many of the most popular services in the world were affected. It was at least the third time in five years that AWS’s northern Virginia cluster, known as US-EAST-1, contributed to a major internet meltdown. Every service is different, and nowadays, with so many dependencies and coupling, it is hard not to be affected by an incident like this. However, specifically for WebRTC platforms, it should be relatively straightforward to handle cases where one specific AWS region goes down. Basic WebRTC Architecture The basic architecture of most of WebRTC platforms comprises four components: Load balancing service:  The client application’s initial point of contact, selecting the appropriate media server for a session. Database:  Stores server information (status, load, location) and session-to-server mappings, often using a fast database like Redis. Signaling service:  Manages...

Reflection+Amplification attacks abusing TURN servers

Image
Last week there were many messages in the coturn issue tracker about TURN instances being blocked by some cloud providers because it was detected that those servers were being used to attack other hosts. This is not new as Wire had already reported it and even suggested a mitigation some months ago. The attacks being carried out in this case are based on two properties of TURN servers: Reflection : A TURN server, by design, sends the responses to the TURN messages to the source IP of the Request. If an attacker is able to change his source IP address (spoofing) then it can direct the response of his TURN request to any other host he wants. Amplification : TURN responses are usually bigger than the corresponding requests. This is especially problematic for authentication error responses that must include a NONCE value to be used to attempt authentication properly based on the standard authentication defined in the protocol. We can see a real example of the second property in a commer...

Another sneaky WebRTC optimisation only known by Google Meet (RemoteEstimate RTCP packets)

Image
Google Meet has consistently delivered superior quality among WebRTC applications (at least for web applications). This is especially true compared with typical open-source solutions, but it stands even for most commercial solutions. The reason is that Google has the team that understands the media stack they have built very well and can make it behave in ways that solve their problems in the browser with knobs hidden in WebRTC that only they are aware of. Some of us still remember how simulcast support was added to WebRTC approximately 12 years ago with SDP munging and without any documentation or note behind the x-google-conference flag, or how “Audio Network Adaptation” was added with a secret string encoding an undocumented protobuf schema to tune settings to improve audio quality. Today I was trying to debug why a WebRTC application had a less stable bandwidth estimation than Google Meet. The scenario was quite simple: in a perfect network with plenty of bandwidth, add 50ms of ex...

OpenAI WebRTC API Review

Image
There is a new interface added to OpenAI RealTime models. Now it supports WebRTC! Given the people working on it I'm sure it has to be great so as usual let’s take a look and see what is under the hood in terms of audio transmission. Signalling or Establishment of the connection There are two options for the establishment of a RealTime session with the OpenAI servers: WebSocket signalling : much nicer API without ugly SDPs involved but less suited for public networks. HTTP/WebRTC signalling : has an uglier API including SDP offer/answer negotiations but can work well in real networks that is critical for most of the use cases. In the rest of the post we will focus only in the later (HTTP/WebRTC) that is the most interesting one. Authentication The first step to use these RealTime APIs sending audio data directly from clients to OpenAI servers is to obtain an ephemeral key using you OpenAI API Secret. This is a simple HTTP request that for testing you can do from the command line: `...

Target Bitrates vs Max Bitrates

Image
  Not all the simulcast layers have the same encoding quality When using simulcast video encoding with WebRTC, the encoder generates different versions or layers of the video input with varying resolutions. Using this techniques a multiparty video server (SFU) can adapt the video that each participant in a room receives based on factors such as available bandwidth, CPU/battery level, or the rendering size of those videos in each receiver. How simulcast works with an SFU forwarding layers selectively These different versions of the video have varying resolutions, but what about their encoding quality? For example, if a user is receiving a video and rendering it in a window of 640x360, would he get the same quality if he receives the 640x360 layer as if he receives the highest layer of 1280x720? To answer this question about the quality of each resolution, we can examine first the bitrates used by each. But the interesting thing is that the bitrate of each resolution is not always th...