How to use ReplayKit with RTC on IOS

In the increasingly numerous live broadcast scenarios, if you are also a fan of a certain game anchor, there is a live broadcast method that you must be familiar with, that is, the screen sharing we are going to talk about today.

Screen sharing in live broadcast scenarios not only needs to share the picture displayed on the current monitor to the remote end, but also needs to transmit the sound, including the sound of the application and the sound of the anchor. In view of these two requirements, we can simply analyze the media streams required for a live broadcast of screen sharing as follows:

A video stream of a monitor screen
An audio stream that applies sound
An audio stream of anchor sound

ReplayKit is an Apple-provided framework for screen recording on iOS systems.

First, let’s take a look at the data callback interface provided by Apple for screen recording:

override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
         {
            switch sampleBufferType {
            case .video:
                (sampleBuffer)
            case .audioApp:
                (sampleBuffer)
            case .audioMic:
                (sampleBuffer)
            @unknown default:
                break
            }
        }
    }

From the enumeration sampleBufferType, it is not difficult to see that it just meets our above requirements for media streaming.

Video format

guard let videoFrame = CMSampleBufferGetImageBuffer(sampleBuffer) else {
    return
}
        
let type = CVPixelBufferGetPixelFormatType(videoFrame)

type = kCVPixelFormatType_420YpCbCr8BiPlanarFullRange

passCVPixelBufferGetPixelFormatType, we can get the video format of each frame asyuv420。

Frame rate

Through the number of callbacks on the printing interface, it can be known that the video frames that can be obtained per second are 30, that is, the frame rate is 30.

Both the format and frame rate can meet the range that Agora RTC can receive, so thepushExternalVideoFrameYou can share the video to the remote end.

(frame)

Insert a little knowledge

The frames displayed by the monitor come from a frame buffer area, usually double cache or triple cache. After the screen displays a frame, a vertical synchronization signal (V-Sync) is sent, telling the frame buffer to switch to the cache of the next frame, and then the display starts reading the new frame of data for display.

This frame buffer is system-level and cannot be read and written by ordinary developers. However, if the recording framework ReplayKit provided by Apple itself can directly read the frames that have been rendered and will be used for the display, and this process will not affect the rendering process and cause frame drops, then it can reduce the rendering process used to provide the ReplayKit callback data.

Audio

There are two types of audio that ReplayKit can provide, which is divided into audio streams recorded in the microphone and audio streams played by the currently responding to the application. (The former is called AudioMic and the latter is AudioApp)

You can obtain the audio format through the following two lines of code

CMAudioFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);
const AudioStreamBasicDescription *description = CMAudioFormatDescriptionGetStreamBasicDescription(format);

AudioApp

AudioApp will have different channel numbers under different models. For example, in the models below iPad or iPhone7, there is no device with dual-channel playback. At this time, the data of AudioApp is mono, and vice versa.

The sampling rate is 44100 in some models tried, but it is not ruled out that the untested models will have other sampling rates.

AudioMic

AudioMic in the tested models has a sampling rate of 32,000 and a mono channel number.

Audio preprocessing

If we send AudioApp and AudioMic as two audio streams, the traffic is definitely greater than one audio stream. In order to save the traffic of an audio stream, we need to mix these two audio streams (fusion).

However, through the above, it is not difficult to see that the formats of the two audio streams are different, and it cannot be guaranteed whether other formats will appear depending on the model. During the test, it was also found that the OS version was different, and the length of the audio data given by each callback would also change. So before we mix two audio streams, we need to unify the formats to deal with the various formats given by ReplayKit. So we took the following important steps:

if (channels == 1) {
    int16_t* intData = (int16_t*)dataPointer;
    int16_t newBuffer[totalSamples * 2];
            
    for (int i = 0; i < totalSamples; i++) {
        newBuffer[2 * i] = intData[i];
        newBuffer[2 * i + 1] = intData[i];
    }
    totalSamples *= 2;
    memcpy(dataPointer, newBuffer, sizeof(int16_t) * totalSamples);
    totalBytes *= 2;
    channels = 2;
}

Whether it is AudioMic or AudioApp, as long as the incoming stream is mono, we convert it into two channels;

if (sampleRate != resampleRate) {
    int inDataSamplesPer10ms = sampleRate / 100;
    int outDataSamplesPer10ms = (int)resampleRate / 100;

    int16_t* intData = (int16_t*)dataPointer;

    switch (type) {
        case AudioTypeApp:
            totalSamples = resampleApp(intData, dataPointerSize, totalSamples,
                                       inDataSamplesPer10ms, outDataSamplesPer10ms, channels, sampleRate, (int)resampleRate);
            break;
        case AudioTypeMic:
            totalSamples = resampleMic(intData, dataPointerSize, totalSamples,
                                       inDataSamplesPer10ms, outDataSamplesPer10ms, channels, sampleRate, (int)resampleRate);
            break;
    }

    totalBytes = totalSamples * sizeof(int16_t);
}

Whether it is AudioMic or AudioApp, as long as the incoming stream sampling rate is not 48,000, we resample them to 48,000;

memcpy(appAudio + appAudioIndex, dataPointer, totalBytes);
appAudioIndex += totalSamples;

memcpy(micAudio + micAudioIndex, dataPointer, totalBytes);
micAudioIndex += totalSamples;

Through the first and second steps, we ensure that both audio streams are in the same audio format. However, since ReplayKit gives a type of data at a time, we have to use two buffers to store these two stream data before mixing;

int64_t mixIndex = appAudioIndex > micAudioIndex ? micAudioIndex : appAudioIndex;
        
int16_t pushBuffer[appAudioIndex];
        
memcpy(pushBuffer, appAudio, appAudioIndex * sizeof(int16_t));
        
for (int i = 0; i < mixIndex; i ++) {
   pushBuffer[i] = (appAudio[i] + micAudio[i]) / 2;
}

ReplayKit has the option to enable microphone recording, so when microphone recording is turned off, we only have one AudioApp audio stream. So we mainly use this stream to read the data length of the AudioMic cache area, and then compare the data lengths of the two cache areas to use the minimum data length as our mixing length. Fusion the data in the two buffer areas of the mix length, obtain the mixed data, and write it to a new mixing buffer area (or directly write it to the AudioApp cache area);

[AgoraAudioProcessing pushAudioFrame:(*unsigned* *char* *)pushBuffer
                                   withFrameSize:appAudioIndex * *sizeof*(int16_t)];

Finally, we copy the data from this mix into the C++ recording callback interface of Agora RTC. At this time, we can transmit the sound recorded by the microphone and the sound played by the application to the remote end.

Through the processing of audio and video streams and combined with the Agora RTC SDK, we have completed the implementation of a live broadcast scenario for screen sharing.

The above is the detailed content on how to use ReplayKit and RTC on IOS. For more information about using ReplayKit and RTC on IOS, please follow my other related articles!