SoFunction
Updated on 2025-04-04

C/C++ Windows SAPI implements text to voice function

This article provides a modern C++ interface to implement text to voice function by encapsulating Windows SAPI (Speech Application Programming Interface). The main features include support for synchronous/asynchronous speech synthesis, adjustable speech speed (-10 to 10) and volume control (0-100%), and support for saving synthetic speech as WAV files and automatically handling special character escapes, which also ensures thread safety in design. This interface depends on Windows systems (requires .NET Framework support), PowerShell 5.1 and above, and C++11 or later. The complete code is provided at the end of the text.

Start quickly

Basic usage examples

#include ""
int main() {
    TTS::TextToSpeech tts;
    // Set voice parameters    tts.set_rate(5);    // Speed ​​up your speech    tts.set_volume(80); // 80% volume    // Read aloud in synchronization    tts.speak_sync("Hello, welcome to the text-to-speech system.");
    // Read asynchronously    auto future = tts.speak_async("This is an async operation.");
    (); // Wait for completion    // Save to file    std::string filename = tts.save_to_wav("Audio saved to file.");
    return 0;
}

Detailed explanation of core functions

Voice parameter settings

Speed ​​control (set_rate())

void set_rate(int rate);  // scope:-10 ~ 10
  • Speed ​​up the speech
  • Negative value slows down speech
  • Automatic clamping within the effective range

Volume control (set_volume())

void set_volume(int volume);  // scope:0 ~ 100
  • 0 means mute
  • 100 means maximum volume
  • Supports precise percentage control

Synchronous reading (speak_sync())

bool speak_sync(const std::string& text);
  • Block the current thread until the reading is complete
  • Return to the execution status (true means success)
  • Suitable for scenarios that require sequential execution

Example:

if (!tts.speak_sync("Critical system alert!")) {
    // Error handling}

Asynchronous reading (speak_async())

std::future<bool> speak_async(const std::string& text);
  • Return the std::future object immediately
  • Supports multiple waiting methods:
auto future = tts.speak_async("Processing completed");
// Method 1: Blocking and waiting();
// Method 2: Polling and checkingwhile (future.wait_for(100ms) != std::future_status::ready) {
    // Perform other tasks}
// Get resultsbool success = ();

Save audio files (save_to_wav())

std::string save_to_wav(const std::string& text, 
                       const std::string& filename = "");
  • Automatically generate temporary files (when filename is empty)
  • Return the final file path
  • File saving location rules:
    • Specify filename: Use the full path
    • Unspecified: Generate random file name (system temporary directory)

Example:

// Automatically generate temporary filesauto auto_file = tts.save_to_wav("Automatic filename");
// Custom pathstd::string custom_path = R"(C:\audio\)";
auto custom_file = tts.save_to_wav("Custom path", custom_path);

Advanced Usage

Batch voice generation

std::vector<std::future<bool>> batch_process() {
    TTS::TextToSpeech tts;
    std::vector<std::future<bool>> results;
    for (int i = 0; i < 10; ++i) {
        std::string text = "Message " + std::to_string(i);
        results.push_back(tts.speak_async(text));
    }
    return results;
}

Real-time progress tracking

void monitor_async() {
    auto future = tts.speak_async("Long running operation");
    std::thread monitor([&future]{
        while (future.wait_for(1s) != std::future_status::ready) {
            std::cout << "Synthesizing..." << std::endl;
        }
        std::cout << "Completed with status: " << () << std::endl;
    });
    ();
}

Notes and best practices

Character processing

  • Automatically escape XML special characters:&, <, >, ", '
  • Supports multilingual text (requires support for system voice package)
  • It is recommended to preprocess user input:
std::string sanitize_input(const std::string&amp; raw) {
    // Remove control characters, etc.    std::string filtered;
    std::copy_if((), (), std::back_inserter(filtered),
        [](char c){ return std::isprint(c); });
    return filtered;
}

Performance optimization

  • Reuse TextToSpeech instances (avoid repeated initialization)
  • Pay attention to life cycle management during asynchronous operations:
// Error example (object destroyed in advance):auto future = TTS::TextToSpeech().speak_async("text");
// Correct way:auto tts = std::make_shared&lt;TTS::TextToSpeech&gt;();
auto future = tts-&gt;speak_async("text");

Error handling

Check the return value:

if (!tts.speak_sync("text")) {
    std::cerr << "Speech synthesis failed" << std::endl;
}

Common causes of errors:

  • Insufficient PowerShell access permissions
  • Invalid file path
  • System voice engine failure

FAQ

Q: What audio formats are supported?
A: Currently, only WAV format is supported, and it is determined by the system API.

Q: How to deal with Chinese characters?
A: Need to ensure:

  • The system has installed Chinese voice package
  • Code files are encoded using UTF-8
  • Unicode is supported on the console (it is recommended to use chcp 65001)

Q: Why do you need to generate batch files?
A: To solve:

  • Coding issues directly executed by PowerShell
  • Long command line parameter limit
  • Error code capture requirements

Q: Maximum supported text length?
A: Determined by the system restrictions, it is recommended to process texts that exceed 1MB in segments.

Q: How to achieve voice interruption?
A: The current version is not implemented, but the asynchronous operation can be terminated by destroying the object.

source code

#pragma once
#include &lt;string&gt;
#include &lt;sstream&gt;
#include &lt;cstdlib&gt;
#include &lt;random&gt;
#include &lt;atomic&gt;
#include &lt;thread&gt;
#include &lt;memory&gt;
#include &lt;system_error&gt;
#include &lt;future&gt;
#include &lt;fstream&gt;
#include &lt;cstdio&gt;
#ifdef _WIN32
#include &lt;&gt;
#else
#include &lt;&gt;
#endif
namespace TTS {
class TextToSpeech {
public:
    static constexpr int MIN_RATE = -10;
    static constexpr int MAX_RATE = 10;
    static constexpr int MIN_VOLUME = 0;
    static constexpr int MAX_VOLUME = 100;
    explicit TextToSpeech() = default;
    // Set voice rate (-10~10)    void set_rate(int rate) {
        rate_ = clamp(rate, MIN_RATE, MAX_RATE);
    }
    // Set the volume (0~100)    void set_volume(int volume) {
        volume_ = clamp(volume, MIN_VOLUME, MAX_VOLUME);
    }
    // Read aloud synchronously (block until completed)    bool speak_sync(const std::string&amp; text) {
        return execute_command(generate_ps_command(text));
    }
    // Asynchronous reading (return immediately)    std::future&lt;bool&gt; speak_async(const std::string&amp; text) {
        return std::async(std::launch::async, [this, text] { return this-&gt;speak_sync(text); });
    }
    // Generate temporary WAV file (return to file path)    std::string save_to_wav(const std::string&amp; text, const std::string&amp; filename = "") {
        std::string full_path;
        bool clean_up;
        std::tie(full_path, clean_up) = generate_temp_path(filename, ".wav");
        std::string command = generate_ps_command(text, full_path);
        if (!execute_command(command)) {
            if (clean_up) std::remove(full_path.c_str());
            return "";
        }
        return full_path;
    }
private:
    int rate_ = 0; // Default speech speed    int volume_ = 100; // Default volume    std::atomic&lt;bool&gt; cancel_flag_{false};
    // Generate PowerShell command    std::string generate_ps_command(const std::string&amp; text, const std::string&amp; output_file = "") const {
        std::ostringstream oss;
        oss &lt;&lt; "powershell -Command \"";
        oss &lt;&lt; "Add-Type -AssemblyName ; ";
        oss &lt;&lt; "$speech = New-Object ; ";
        oss &lt;&lt; "$ = " &lt;&lt; rate_ &lt;&lt; "; ";
        oss &lt;&lt; "$ = " &lt;&lt; volume_ &lt;&lt; "; ";
        if (!output_file.empty()) {
            oss &lt;&lt; "$('" &lt;&lt; output_file &lt;&lt; "'); ";
        } else {
            oss &lt;&lt; "$(); ";
        }
        oss &lt;&lt; "$([]::VerifyXmlChars('"
            &lt;&lt; escape_ps_string(escape_xml(text)) &lt;&lt; "'));\"";
        return ();
    }
    // Escape PowerShell string    std::string escape_ps_string(const std::string&amp; text) const {
        std::string result;
        (() * 2);
        for (char c : text) {
            result += (c == '\'') ? "''" : std::string(1, c);
        }
        return result;
    }
    // Execute the command and return the result    bool execute_command(const std::string&amp; command) const {
        // Create and write batch files        std::string bat_path;
        bool dummy;
        std::tie(bat_path, dummy) = generate_temp_path("tts_", ".bat");
        std::ofstream bat_file(bat_path);
        if (!bat_file) return false;
        bat_file &lt;&lt; "@echo off\n"
                 &lt;&lt; "chcp 65001 &gt; nul\n"
                 &lt;&lt; command &lt;&lt; "\n"
                 &lt;&lt; "exit /b %ERRORLEVEL%";
        bat_file.close();
        // Execute batch file        std::string cmd = "cmd /c \"" + bat_path + "\"";
        int result = std::system(cmd.c_str());
        // Clean up temporary files        std::remove(bat_path.c_str());
        return (result == 0);
    }
    // Generate temporary file path    std::tuple&lt;std::string, bool&gt; generate_temp_path(const std::string&amp; prefix = "tts_", const std::string&amp; extension = "") const {
        static std::random_device rd;
        static std::mt19937 gen(rd());
        std::uniform_int_distribution&lt;&gt; dis(0, 15);
        std::string full_path;
        bool need_cleanup = false;
        if (()) {
            char tmp_name[L_tmpnam];
            if (std::tmpnam(tmp_name)) {
                full_path = tmp_name;
                need_cleanup = true;
            }
        } else {
            const std::string temp_dir = get_temp_directory();
            do {
                std::string unique_part;
                for (int i = 0; i &lt; 8; ++i) {
                    unique_part += "0123456789abcdef"[dis(gen) % 16];
                }
                full_path = temp_dir + "\\" + prefix + unique_part + extension;
            } while (file_exists(full_path));
        }
        return {full_path, need_cleanup};
    }
    // XML escape    static std::string escape_xml(std::string data) {
        std::string buffer;
        (());
        for (char c : data) {
            switch (c) {
                case '&amp;':  buffer += "&amp;amp;";  break;
                case '\"': buffer += "&amp;quot;"; break;
                case '\'': buffer += "&amp;apos;"; break;
                case '&lt;':  buffer += "&amp;lt;";   break;
                case '&gt;':  buffer += "&amp;gt;";   break;
                default:   buffer += c;        break;
            }
        }
        return buffer;
    }
    // Range limit function    template &lt;typename T&gt;
    static T clamp(T value, T min, T max) {
        return (value &lt; min) ? min : (value &gt; max) ? max : value;
    }
    // Get temporary directory    static std::string get_temp_directory() {
        const char* tmp = std::getenv("TEMP");
        if (!tmp) tmp = std::getenv("TMP");
        return tmp ? tmp : ".";
    }
    // Check whether the file exists    static bool file_exists(const std::string&amp; path) {
#ifdef _WIN32
        return ::_access(path.c_str(), 0) == 0;
#else
        return ::access(path.c_str(), F_OK) == 0;
#endif
    }
};
} // namespace TTS

This is the article about C/C++ Windows SAPI implementing text to voice function. For more related C++ Windows SAPI text to voice content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!