Skip to content

Video Chat and Rooms Setup

Arvind Kasiliya edited this page Dec 17, 2023 · 19 revisions

Introduction

In this section, we go over how we set up the video chat functionality as well as how we added rooms.

Basic Video Call with WebSockets

Because it is free and fairly easy to set up for a basic call, we initially decided to go with plain WebRTC over WebSockets using a Node.js server. This means that we had to set up a separate signaling server to establish a connection between the host and recipient on the video call. Both parties would connect to this server when the joined the video call, and the server would feed the other party's camera back to them.

Writing the code for this was fairly straightforward. We first created a Node.js server using express and socket.io. Express is a Javascript framework that makes it easy to create an API. Socket.io, on the other hand, is a library that simplifies the real-time communication provided by WebSockets by providing an API that is far more elegant and easier to use. These dependencies were then installed using npm install express socket.io in the server folder.

In the server's main Javascript file, we started with some initial ceremony to set up express and socket.io. We then added a handler for signaling as shown below:

io.on('connection', (socket) => {
  socket.on('offer', (offer, targetSocketId) => {
    socket.to(targetSocketId).emit('offer', offer, socket.id);
  });
  
  socket.on('answer', (answer, targetSocketId) => {
    socket.to(targetSocketId).emit('answer', answer);
  });
  
  socket.on('ice-candidate', (candidate, targetSocketId) => {
    socket.to(targetSocketId).emit('ice-candidate', candidate);
  });
});

This code listens for an event named "connection", which is what happens when a new client connects to the server using a WebSocket. When a client connects, a callback function is triggered with a socket object representing the connection to that client. After connecting, the sockets listens on three different events: offer, answer, and ice-candidate. In the context of WebRTC, an "offer" is a message that initiates the connection process. This message typically contains a session description protocol (SDP) data, describing the sender's connection constraints, media formats, etc. When an offer is received from a client it's forwarded to another client, identified by targetSocketId. The server emits an offer event to the target client, sending along the offer data and the socket ID of the sender. Similar to the offer, an answer message is a response to an offer and also contains SDP data. It's sent by the client that received the offer to accept the connection. The server then forwards this 'answer' message to the original sender of the 'offer', using the target socket ID. Finally, the ice-candidate messages are part of the ICE (Interactive Connectivity Establishment) framework used in WebRTC to find the best path for streaming media between peers. Each ICE candidate represents a potential network route for the P2P connection. When an ICE candidate is found, it's sent to the other client (the potential peer) to test the connectivity along that route.

Writing this was the most complicated part of setting up a simple video chat between two clients. As for the client, we called each of these events (offer, answer, and ice-candidate) using socket.io's client library as shown below:

// VideoChat.vue
<template>
  <div>
    <video id="localVideo" autoplay playsinline muted></video>
    <video id="remoteVideo" autoplay playsinline></video>
    <button @click="startCall">Start Call</button>
  </div>
</template>

<script lang="ts">
import { ref, onMounted } from 'vue';
import io from 'socket.io-client';

export default {
  name: 'VideoChat',
  setup() {
    const socket = io('http://localhost:3000');
    const localConnection = new RTCPeerConnection();
    const localVideo = ref(null);
    const remoteVideo = ref(null);

    onMounted(() => {
      navigator.mediaDevices.getUserMedia({ video: true, audio: true })
        .then(stream => {
          localVideo.value.srcObject = stream;
          stream.getTracks().forEach(track => localConnection.addTrack(track, stream));
        });
    });

    localConnection.onicecandidate = event => {
      if (event.candidate) {
        socket.emit('ice-candidate', event.candidate);
      }
    };

    localConnection.ontrack = event => {
      remoteVideo.value.srcObject = event.streams[0];
    };

    socket.on('offer', async (offer, fromSocketId) => {
      const remoteOffer = new RTCSessionDescription(offer);
      await localConnection.setRemoteDescription(remoteOffer);
      const answer = await localConnection.createAnswer();
      await localConnection.setLocalDescription(answer);
      socket.emit('answer', answer, fromSocketId);
    });

    socket.on('answer', async answer => {
      const remoteAnswer = new RTCSessionDescription(answer);
      await localConnection.setRemoteDescription(remoteAnswer);
    });

    socket.on('ice-candidate', candidate => {
      localConnection.addIceCandidate(new RTCIceCandidate(candidate));
    });

    const startCall = async () => {
      const offer = await localConnection.createOffer();
      await localConnection.setLocalDescription(offer);
      socket.emit('offer', offer);
    };

    return {
      localVideo,
      remoteVideo,
      startCall
    };
  }
};
</script>

In this Vue view, we created a template for the local and remote video streams and handled connecting to the server in the setup block. In the setup block, we connected to the server, allowed the other party to join on "answer", and validated them on "ice-candidate". After the validation is complete in the Vue code, the recipient video was added to the screen, creating a basic video chat.

Adding Rooms with Agora

The WebSocket server was fine when we only wanted to have a maximum of two parties (i.e., one call) happening at a given time. The next natural step was to add support for rooms, so that multiple doctors and patients can be using the service at once and doctors can send a code to the patient ahead of time. After looking at how much code would be needed to add to the server to add this functionality, we scrap our custom server and use Agora instead. After creating an account and getting App ID token, we were able to start using the Agora SDK. This code was quite similar to socket.io, so there was not much we had to learn as shown below:

client.on("user-published", onPublished);
client.on("user-unpublished", onUnPublished);

async function onPublished(
  user: IAgoraRTCRemoteUser,
  mediaType: "video" | "audio"
) {
  await client.subscribe(user, mediaType);

  if (mediaType === "video") {
    const remoteVideoTrack = user.videoTrack;

    if (remoteVideoTrack) {
      remoteVideoTrack.play("remote-video");
      isVideoSubed.value = true;
    }
  }

  if (mediaType === "audio") {
    const remoteAudioTrack = user.audioTrack;

    if (remoteAudioTrack) {
      remoteAudioTrack.play();
      isAudioSubed.value = true;
    }
  }
}

async function onUnPublished(
  user: IAgoraRTCRemoteUser,
  mediaType: "video" | "audio"
) {
  await client.unsubscribe(user, mediaType);

  if (mediaType === "video") {
    isVideoSubed.value = false;
  }
  if (mediaType === "audio") {
    isAudioSubed.value = false;
  }
}

We have two event handlers in this code, one for when a user joins (i.e., subscribes) and one for when a user leaves (i.e., unsubscribes). When a user decides to host a meeting, a room code is generated (6-char string of letters and numbers) and they join that "channel" on Agora. When another party joins that room, the onPublished handler is called, and when either of them leave, the onUnPublished function is called.

Conclusion

To conclude, we started our journey into video chat by creating a WebSocket server, but ended up pivoting to Agora due to its significantly higher ease-of-use.

References

https://socket.io/docs/v4/

https://api-ref.agora.io/en/voice-sdk/web/4.x/index.html

Arvind Kasiliya