I had the opportunity to work on many excellent projects. Every project comes with their unique challenges. One of those projects was a video calling application. It was a great learning experience, a lot of fun and a great achievement.

I had a few challenges with the project, the first was cost, the product I worked on had a very slim margin. I had to make sure whatever I made was cost effective. It had to work with minimum internet connectivity, and the conversations had to be recorded. Today's story is about the recording part.

If you want to record webRTC calls, there are many options. But it generally comes down to two types of solution:

  1. Record the stream from server side
  2. Record from client side

Server side recording is widely used, it's easy to set up if you have a media server. It's often very reliable and can be formatted in any desired way. However, it's expensive. Apart from the storage space needed, it requires additional media server, the calls are not peer to peer any more, and all the calls use up server bandwidth. All these also add a small latency to the "real time" video streams, It's noticeable if you have a bad connectivity.

Client side recording is a gray area. you can record using a third party application or use the new Media stream recorder API. But it's often complex to set up, depends a lot on the client's capabilities and can kill the browser if not done properly.

I had to chose client side recording after many trials and calculation. For a fact I knew that the application I am making will be a one-to-one video calling system and one of the parties will be sitting in a call center with good internet connectivity and hardware that I would be able to control. I had to design my solution around those facts.

I had to take care of two streams from one side. The party sitting in a call center will record both the streams, upload to the server because I have no control over the hardware or internet connection of the other party.

The very first attempt was a horrible failure. I used two media stream recorders to record the local and remote streams and then the whole recording would be uploaded once the call will be over. five minutes into a call, and the browser crashed. I totally forgot the fact that the recordings were being stored in the memory and quickly ate up all the allocated memories with raw video data.

I modified the code and recorded the videos as 10 second chunk. That mostly worked out great but soon I realized that two 10s streams uploading at once every 10 seconds puts pressure on the client bandwidth and decreases the video quality of the live call. I had to reduce the upload to once at a time.

So I cooked up a small queue mechanism. raw video streams are gathered in a queue and then gets uploaded one by one. This solved the bandwidth pressure problem. but eventually started to build up the queue and then fill the memory.

So I had to find a clever mechanism to randomize the length to best avoid overlap of the streams, This combined with the queue mechanism solved my upload problem. But a recording is only good if it can be played back. I had to figure out how to play it back.

The clips are saved in Amazon AWS S3 with one time signed URL. I thought I could use Amazon AWS Elastic transcoder to stitch the clips and make a full recording. I created a AWS Lambda Function to invoke elastic transcoder on the right files and output to another S3 Bucket. Once the calls finished, the system would invoke the lambda function and the transcoding would begin.

I was almost ready to celebrate the success when I noticed many of the transcoding jobs were failing because the clips of the same call had different video resolution and bitrate. Specially on the remote stream. This makes sense because webRTC adjusts these parameters based on internet connectivity in real time. The transcoder was not able to stitch those video clips and failed.

So.. I created a player. From scratch. the videos are accessible only using one time signed URL, I had to make a player that plays both local and remote stream side by side, make sure they are in sync, and downloads and prepares the next chunk in the background for uninterrupted playback experience. With a bit of hard work and a lot of math, The player was functional.

And that way I probably made the most cost effective webRTC recording solution I could imaging.