Example of C# implementing facial recognition based on ffmpeg plus Hongsoft

About Face Recognition

Currently, facial recognition is relatively mature, with various free and free commercial solutions and open source solutions. Among them, OpenCV has supported facial recognition very early. When I chose the facial recognition development library, I also horizontally compared three libraries, including Baidu for online recognition, open source OpenCV and commercial library Hongruan (small and medium-sized scale free).

Baidu's facial recognition has just been online for a while, and the document is not perfect. I contacted Baidu before, and the official also gave me an example based on Android, but it does not meet my needs. First, the photos need to be uploaded to the Baidu server (this is the biggest problem), and secondly, the positioning of the face needs to be realized by yourself (uploaded for recognition after capturing the face).

OpenCV was used a long time ago. When it comes to face + license plate recognition, the first thing I consider was OpenCV, but the recognition rate was not very high at that time. Later, a recognition library developed by a teacher from the University of Electronic Science and Technology (relatively easy to use, and the recognition rate is also good), so when I was preparing to do it this time, I did not choose OpenCV.

Hongruan actually discovered it accidentally. At that time, he was looking for a development library and testing a Python solution. He found that there was news that Hongruan’s recognition library was fully open and can be used for free, and it was offline recognition. So I downloaded it and tried it. I found that the recognition rate was pretty good, so I tentatively decided to use Hongruan’s recognition solution. Here I will mainly share with you some pitfalls and usage experiences during the development process, and by the way, the C# Wrapper of the open source identification library.

SDK's C# Wrapper

Since Hongruan’s library is developed in C++, and my application uses C#, it is necessary to wrap the library to facilitate C# calls. The main requirement of packaging is to be able to call quickly and conveniently in C#, without considering memory, pointers and other issues, and has certain fault tolerance. The Wrapper library is currently open source, and you can download it on Github.Click here for address. There is basically nothing to say about the Wrapper library. It is nothing more than a packaging of PInvoke, but it does more details, blocks call details, and provides relatively high-level functions. If you are interested, you can check out the source code.

Examples of the use of the Wrapper library

Basic use

Face detection (static images):

using (var detection = ("appId", "sdkKey"))
{
  var image = ("");
  var bitmap = new Bitmap(image);

  var result = (bitmap, out var locateResult);
  //After detecting that the location information is used, resources need to be released to avoid memory leakage  using (locateResult)
  {
    if (result ==  &amp;&amp;  &gt; 0)
    {
      using (var g = (bitmap))
      {
        var face = [0].ToRectangle();
        (new Pen(), , , , );
      }

      ("", );
    }
  }
}

Face tracking (Face tracking is generally used for continuous frame recognition of videos. Compared with detection, it has higher execution efficiency. Here we use static pictures as an example, and there is no difference between actual use and detection):

using (var detection = ("appId", "sdkKey"))
{
  var image = ("");
  var bitmap = new Bitmap(image);

  var result = (bitmap, out var locateResult);
  using (locateResult)
  {
    if (result ==  &&  > 0)
    {
      using (var g = (bitmap))
      {
        var face = [0].ToRectangle();
        (new Pen(), , , , );
      }

      ("", );
    }
  }
}

Face comparison:

using (var proccesor = new FaceProcessor("appid",
        "locatorKey", "recognizeKey", true))
{
  var image1 = ("");
  var image2 = ("");

  var result1 = (new Bitmap(image1));
  var result2 = (new Bitmap(image2));
  
  //FaceProcessor is an integrated packaging class that integrates detection and recognition. If you want to use recognition separately, you can use the FaceRecognize class  // Here is a demonstration, assuming that the pictures only have one face  // FeatureData can be saved permanently, which is the face feature data, used for subsequent face matching  //("", );FeatureData will automatically transform into a byte array
  if ((result1 != null) &amp; (result2 != null))
    ((result1[0].FeatureData, result2[0].FeatureData, true));
}

Notes on using

Both LocateResult and Feature contain memory resources that need to be released. After use, remember to release them, otherwise it will cause memory leakage. The Match functions of FaceProcessor and FaceRecognize can be automatically released after the comparison is completed. Only the last two parameters need to be specified as true. If it is used for face matching (1:N), the default parameters can be used. In this case, the feature data specified by the first parameter will not be automatically released and used to compare the features of the loop with the feature library.

Complete example of integration

On Github, there is a completeFaceDemoFor example, it mainly implements the acquisition of images of the RTSP protocol through ffmpeg (using Hikvision's camera) and then performs face matching. There are many pitfalls encountered during the development process.

The primary task of facial recognition is to capture camera video frames, which is the longest pitfall in this area, because the OpenCV packaging library was first used. During the development process, there was no big problem when capturing USB cameras and no abnormalities occurred. When capturing RTSP video streams, AccessviolationException will occur irregularly, ranging from dozens of minutes to several hours, which will be unstable. On the official Github address, it also mentionedIssueThe answer they gave was to block my business logic and try to capture the video stream only. The problem was still the same. So, I basically decided to try the above question. Later, after repeated experiments, we finally decided to choose ffmpeg.

ffmepg mainly uses ProcessStartInfo for calls. I use (a package for ffmpeg calls, which can be installed through nuget search). Although ffmpeg solves the stability problem, I also encountered many pitfalls during actual development. Among them, the most important thing is that there is no document or example (actually there is, it needs to be purchased for $75). So I studied for a long time myself to capture the video stream and convert it into a Bitmap object. As long as this step is implemented, the subsequent call to Wrapper is enough.

Detailed explanation of FaceDemo

As mentioned above, capturing video streams and converting Bitmap through ffmpeg is the key, so this is mainly introduced here.

First, the calling parameters of ffmpeg:

var setting =
new ConvertSettings
{
  CustomOutputArgs = "-an -r 15 -pix_fmt bgr24 -updatefirst 1"
}; //-s 1920x1080 -q:v 2 -b:v 64k

task = ("rtsp://admin:[email protected]:554/h264/ch1/main/av_stream", null,
outputStream, Format.raw_video, setting);
 += DataReceived;
();

-an means not to capture audio streams, -r means frame rate, adjust this parameter according to requirements and actual equipment, -pix_fmt is more important. Generally speaking, specifying it as bgr24 will not have too much problem (just look at the specific equipment). It was used as rgb24 before, and the captured images have become Avatar, and the color is reversed. The last parameter is so pitiful that I almost gave up this plan. In itself, when ffmpeg is called, it needs to specify a file name template. The captured output will generate a file according to the template. If you want to output the data to the console, you will pass in the end a -. At the beginning, no updatefirst was specified. After ffmpeg captures the first frame, it throws an exception. Finally, it checked for a long time ffmpeg description (there are many complete parameters, and the output is 1319KB to the text). This parameter is found, indicating that the first file is continuously updated. Finally, when calling video capture, you need to specify the output format and must be specified as Format.raw_video. In fact, this format name is a bit misleading. It should be called raw_image, because the final output is the original bitmap data for each frame.

So far, the capture of video stream data has not been solved, because there is another pitfall. The console buffer size of ProcessStartInfo is only 32768 bytes, that is, each output is not actually a complete bitmap data.

//Full code to participate in Github source code//Code Snippet 1private Bitmap _image;
private IntPtr _pImage;

{
  _pImage = (1920 * 1080 * 3);
  _image = new Bitmap(1920, 1080, 1920 * 3, PixelFormat.Format24bppRgb, _pImage);
}

//Code Snippet 2private MemoryStream outputStream;

private void DataReceived(object sender, EventArgs e)
{
  if ( == 6220800)
    lock (_imageLock)
    {
      var data = ();

      (data, 0, _pImage, );

      (0, );
    }
}

I spent a lot of time exploring (don't look at only a few lines, everyone crashed), and got the above code. First, the image data I captured is 24 bits and the image size is 1080p, so, in fact, the size of a raw bitmap data is stride * height, i.e. width * 3 * height, with a size of 6220800 bytes. Therefore, after judging that the captured data reaches this size, the Bitmap conversion process is performed, and the MemoryStream position is moved to the beginning. When you need to pay attention, since the original data is captured (the HeaderInfo does not contain bmp), pay attention to the construction method of Bitmap. It is constructed by a pointer to the original data position. When updating the image, you only need to update the position data pointed to by the pointer, without creating a new Bitmap instance.

Once the bitmap data is obtained, it can be used for identification processing. The recognition logic is happily added, but reality is always full of surprises and surprises. Yes, the pit is coming again. When the recognition logic is not added, the captured image is displayed on the PictureBox very normally, clear and smoothly. After adding the recognition logic, the screen (the captured image screen), the shadowing, the display delay (at least 10-20 seconds will be delayed), and the program is stuttering. In short, various problems are all about it. At the beginning, my recognition logic was written into the DataReceived method. This method was run in another thread outside the main thread. In fact, according to the principle, the capture, recognition and display are located in a thread. There should be no problems. I estimate (not sure, I didn't study it in depth. If anyone knows the actual reason, I can leave a message to tell me). It is because of ffmpeg. Because ffmpeg is a separate process running, its data capture is continuously carried out, and the processing time of the identification module is greater than the acquisition time of each frame. Therefore, the data in the buffer is not processed in time. Some image data (data greater than 32768) received by ffmpeg is discarded, and various problems arise. Finally, it was another time-consuming exploration journey.

private void Render()
{
  while (_renderRunning)
  {
    if (_image == null)
      continue;

    Bitmap image;

    lock (_imageLock)
    {
      image = (Bitmap) _image.Clone();
    }

    if (_shouldShot){
      WriteFeature(image);
      _shouldShot = false;
    }

    Verify(image);

    if ()
      (new Action(() => {  = image; }));
    else
       = image;
  }
}

As mentioned in the above code, I opened a separate thread for image recognition processing and display, and each time I cloned a new Bitmap instance from the captured image for processing. The disadvantage of this method is that it may lead to frame drops, because as mentioned above, the recognition time (if a new face is detected, then it takes about 130ms to add a match) is greater than the time per frame, but it does not affect the recognition effect and the implementation of the requirements, and basically discarded frames can be negligible. Finally, the operation is stable and perfect, and there is no real sense of frame loss.

The demo program, I ran it for about 4 days, and there were no exceptions or recognition errors in the middle.

Written at the end

Although Hongruan officially stated that the free recognition library is suitable for recognition below 1,000 face libraries, in fact, doing a certain amount of work (the workload is actually not small), it can also achieve large-scale face search. For example, multi-threading is used for matching. If the number of faces in the face library is greater than 1,000, each thread can be considered to process them separately and cache the face feature data (the feature data of a face is 22KB, which requires high memory requirements) to improve the program's recognition and search efficiency. Or when the face library is particularly large, distributed processing can be adopted, face features can be loaded into the Redis database, multiple processes can read and process multiple threads, each thread uploads its own recognition results, and then the main process does the result merging and judgment work. The main challenge lies in the consistency of work allocation of multi-threads and the tolerance of single-point failures.

The above is all the content of this article. I hope it will be helpful to everyone's study and I hope everyone will support me more.