SoFunction
Updated on 2025-03-03

Qt realizes high-accuracy speech recognition

1. Select the voice recognition engine

In the open source voice recognition project, the following two tools can be used to support Chinese and English recognition and are compatible with Qt:

Vosk: Vosk is an open source voice recognition tool that supports Chinese and English and multiple languages, has offline recognition capabilities, and does not rely on the Internet.

PaddleSpeech: PaddleSpeech is an open source voice recognition tool from Baidu, with high accuracy, but requires a little more configuration.

This example will use Vosk, which supports multiple platforms and is easy to integrate into C++ projects, meeting offline usage, more than 90% accuracy, open source and other requirements.

Resource download

First, download Vosk's C++ library and Chinese and English model files:/gh_mirrors/vo/vosk-api/overview

Vosk library: Vosk GitHub repository

Chinese and English models:Vosk model download

Download the corresponding libraries and models, and make sure that your development environment is already configured with CMake and Qt development environments.

3. Sample code

Here is a complete Qt project code example showing how to use the Vosk API for Chinese and English recognition in C++. Suppose you have downloaded and decompressed the model file.

#include <QCoreApplication>
#include <QAudioInput>
#include <QBuffer>
#include <QFile>
#include <vosk_api.h>
#include <iostream>

class SpeechRecognizer : public QObject {
    Q_OBJECT
public:
    SpeechRecognizer(const QString &modelPath, QObject *parent = nullptr)
        : QObject(parent) {
        model = vosk_model_new(().c_str());
        recognizer = vosk_recognizer_new(model, 16000.0);
    }

    ~SpeechRecognizer() {
        vosk_recognizer_free(recognizer);
        vosk_model_free(model);
    }

    void startRecognition() {
        QAudioFormat format;
        (16000);
        (1);
        (16);
        ("audio/pcm");
        (QAudioFormat::LittleEndian);
        (QAudioFormat::SignedInt);

        audioInput = new QAudioInput(format, this);
        (QIODevice::WriteOnly | QIODevice::Truncate);
        audioInput->start(&audioBuffer);

        connect(audioInput, &QAudioInput::stateChanged, this, &SpeechRecognizer::onStateChanged);
    }

private slots:
    void onStateChanged(QAudio::State state) {
        if (state == QAudio::IdleState) {
            audioInput->stop();
            ();
            processAudio();
        }
    }

    void processAudio() {
        QByteArray audioData = ();
        int length = ();
        const char *data = ();

        if (vosk_recognizer_accept_waveform(recognizer, data, length)) {
            std::cout << vosk_recognizer_result(recognizer) << std::endl;
        } else {
            std::cout << vosk_recognizer_partial_result(recognizer) << std::endl;
        }
    }

private:
    VoskModel *model;
    VoskRecognizer *recognizer;
    QAudioInput *audioInput;
    QBuffer audioBuffer;
};

int main(int argc, char *argv[]) {
    QCoreApplication app(argc, argv);

    QString modelPath = "/path/to/vosk-model"; // Replace this path with the actual model path    SpeechRecognizer recognizer(modelPath);
    ();

    return ();
}

4. Compile and run

Add vosk_api.h and vosk library files to your project and configure the vosk library path in it. After compiling, run the program to start recording and real-time Chinese and English voice recognition.

5. Tips

Ensure that the microphone sample rate is 16kHz to match the sample rate of the identification model.

During the run, you need to ensure that the model path is correct and install the required Qt and Vosk dependency libraries.

Reference resources

Vosk official documentation and API:/vosk

This is the end of this article about Qt achieving high accuracy speech recognition. For more related Qt speech recognition content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!