This article provides a modern C++ interface to implement text to voice function by encapsulating Windows SAPI (Speech Application Programming Interface). The main features include support for synchronous/asynchronous speech synthesis, adjustable speech speed (-10 to 10) and volume control (0-100%), and support for saving synthetic speech as WAV files and automatically handling special character escapes, which also ensures thread safety in design. This interface depends on Windows systems (requires .NET Framework support), PowerShell 5.1 and above, and C++11 or later. The complete code is provided at the end of the text.
Start quickly
Basic usage examples
#include "" int main() { TTS::TextToSpeech tts; // Set voice parameters tts.set_rate(5); // Speed up your speech tts.set_volume(80); // 80% volume // Read aloud in synchronization tts.speak_sync("Hello, welcome to the text-to-speech system."); // Read asynchronously auto future = tts.speak_async("This is an async operation."); (); // Wait for completion // Save to file std::string filename = tts.save_to_wav("Audio saved to file."); return 0; }
Detailed explanation of core functions
Voice parameter settings
Speed control (set_rate()
)
void set_rate(int rate); // scope:-10 ~ 10
- Speed up the speech
- Negative value slows down speech
- Automatic clamping within the effective range
Volume control (set_volume()
)
void set_volume(int volume); // scope:0 ~ 100
- 0 means mute
- 100 means maximum volume
- Supports precise percentage control
Synchronous reading (speak_sync()
)
bool speak_sync(const std::string& text);
- Block the current thread until the reading is complete
- Return to the execution status (true means success)
- Suitable for scenarios that require sequential execution
Example:
if (!tts.speak_sync("Critical system alert!")) { // Error handling}
Asynchronous reading (speak_async()
)
std::future<bool> speak_async(const std::string& text);
- Return the std::future object immediately
- Supports multiple waiting methods:
auto future = tts.speak_async("Processing completed"); // Method 1: Blocking and waiting(); // Method 2: Polling and checkingwhile (future.wait_for(100ms) != std::future_status::ready) { // Perform other tasks} // Get resultsbool success = ();
Save audio files (save_to_wav()
)
std::string save_to_wav(const std::string& text, const std::string& filename = "");
- Automatically generate temporary files (when filename is empty)
- Return the final file path
- File saving location rules:
- Specify filename: Use the full path
- Unspecified: Generate random file name (system temporary directory)
Example:
// Automatically generate temporary filesauto auto_file = tts.save_to_wav("Automatic filename"); // Custom pathstd::string custom_path = R"(C:\audio\)"; auto custom_file = tts.save_to_wav("Custom path", custom_path);
Advanced Usage
Batch voice generation
std::vector<std::future<bool>> batch_process() { TTS::TextToSpeech tts; std::vector<std::future<bool>> results; for (int i = 0; i < 10; ++i) { std::string text = "Message " + std::to_string(i); results.push_back(tts.speak_async(text)); } return results; }
Real-time progress tracking
void monitor_async() { auto future = tts.speak_async("Long running operation"); std::thread monitor([&future]{ while (future.wait_for(1s) != std::future_status::ready) { std::cout << "Synthesizing..." << std::endl; } std::cout << "Completed with status: " << () << std::endl; }); (); }
Notes and best practices
Character processing
- Automatically escape XML special characters:
&
,<
,>
,"
,'
- Supports multilingual text (requires support for system voice package)
- It is recommended to preprocess user input:
std::string sanitize_input(const std::string& raw) { // Remove control characters, etc. std::string filtered; std::copy_if((), (), std::back_inserter(filtered), [](char c){ return std::isprint(c); }); return filtered; }
Performance optimization
- Reuse TextToSpeech instances (avoid repeated initialization)
- Pay attention to life cycle management during asynchronous operations:
// Error example (object destroyed in advance):auto future = TTS::TextToSpeech().speak_async("text"); // Correct way:auto tts = std::make_shared<TTS::TextToSpeech>(); auto future = tts->speak_async("text");
Error handling
Check the return value:
if (!tts.speak_sync("text")) { std::cerr << "Speech synthesis failed" << std::endl; }
Common causes of errors:
- Insufficient PowerShell access permissions
- Invalid file path
- System voice engine failure
FAQ
Q: What audio formats are supported?
A: Currently, only WAV format is supported, and it is determined by the system API.
Q: How to deal with Chinese characters?
A: Need to ensure:
- The system has installed Chinese voice package
- Code files are encoded using UTF-8
- Unicode is supported on the console (it is recommended to use chcp 65001)
Q: Why do you need to generate batch files?
A: To solve:
- Coding issues directly executed by PowerShell
- Long command line parameter limit
- Error code capture requirements
Q: Maximum supported text length?
A: Determined by the system restrictions, it is recommended to process texts that exceed 1MB in segments.
Q: How to achieve voice interruption?
A: The current version is not implemented, but the asynchronous operation can be terminated by destroying the object.
source code
#pragma once #include <string> #include <sstream> #include <cstdlib> #include <random> #include <atomic> #include <thread> #include <memory> #include <system_error> #include <future> #include <fstream> #include <cstdio> #ifdef _WIN32 #include <> #else #include <> #endif namespace TTS { class TextToSpeech { public: static constexpr int MIN_RATE = -10; static constexpr int MAX_RATE = 10; static constexpr int MIN_VOLUME = 0; static constexpr int MAX_VOLUME = 100; explicit TextToSpeech() = default; // Set voice rate (-10~10) void set_rate(int rate) { rate_ = clamp(rate, MIN_RATE, MAX_RATE); } // Set the volume (0~100) void set_volume(int volume) { volume_ = clamp(volume, MIN_VOLUME, MAX_VOLUME); } // Read aloud synchronously (block until completed) bool speak_sync(const std::string& text) { return execute_command(generate_ps_command(text)); } // Asynchronous reading (return immediately) std::future<bool> speak_async(const std::string& text) { return std::async(std::launch::async, [this, text] { return this->speak_sync(text); }); } // Generate temporary WAV file (return to file path) std::string save_to_wav(const std::string& text, const std::string& filename = "") { std::string full_path; bool clean_up; std::tie(full_path, clean_up) = generate_temp_path(filename, ".wav"); std::string command = generate_ps_command(text, full_path); if (!execute_command(command)) { if (clean_up) std::remove(full_path.c_str()); return ""; } return full_path; } private: int rate_ = 0; // Default speech speed int volume_ = 100; // Default volume std::atomic<bool> cancel_flag_{false}; // Generate PowerShell command std::string generate_ps_command(const std::string& text, const std::string& output_file = "") const { std::ostringstream oss; oss << "powershell -Command \""; oss << "Add-Type -AssemblyName ; "; oss << "$speech = New-Object ; "; oss << "$ = " << rate_ << "; "; oss << "$ = " << volume_ << "; "; if (!output_file.empty()) { oss << "$('" << output_file << "'); "; } else { oss << "$(); "; } oss << "$([]::VerifyXmlChars('" << escape_ps_string(escape_xml(text)) << "'));\""; return (); } // Escape PowerShell string std::string escape_ps_string(const std::string& text) const { std::string result; (() * 2); for (char c : text) { result += (c == '\'') ? "''" : std::string(1, c); } return result; } // Execute the command and return the result bool execute_command(const std::string& command) const { // Create and write batch files std::string bat_path; bool dummy; std::tie(bat_path, dummy) = generate_temp_path("tts_", ".bat"); std::ofstream bat_file(bat_path); if (!bat_file) return false; bat_file << "@echo off\n" << "chcp 65001 > nul\n" << command << "\n" << "exit /b %ERRORLEVEL%"; bat_file.close(); // Execute batch file std::string cmd = "cmd /c \"" + bat_path + "\""; int result = std::system(cmd.c_str()); // Clean up temporary files std::remove(bat_path.c_str()); return (result == 0); } // Generate temporary file path std::tuple<std::string, bool> generate_temp_path(const std::string& prefix = "tts_", const std::string& extension = "") const { static std::random_device rd; static std::mt19937 gen(rd()); std::uniform_int_distribution<> dis(0, 15); std::string full_path; bool need_cleanup = false; if (()) { char tmp_name[L_tmpnam]; if (std::tmpnam(tmp_name)) { full_path = tmp_name; need_cleanup = true; } } else { const std::string temp_dir = get_temp_directory(); do { std::string unique_part; for (int i = 0; i < 8; ++i) { unique_part += "0123456789abcdef"[dis(gen) % 16]; } full_path = temp_dir + "\\" + prefix + unique_part + extension; } while (file_exists(full_path)); } return {full_path, need_cleanup}; } // XML escape static std::string escape_xml(std::string data) { std::string buffer; (()); for (char c : data) { switch (c) { case '&': buffer += "&amp;"; break; case '\"': buffer += "&quot;"; break; case '\'': buffer += "&apos;"; break; case '<': buffer += "&lt;"; break; case '>': buffer += "&gt;"; break; default: buffer += c; break; } } return buffer; } // Range limit function template <typename T> static T clamp(T value, T min, T max) { return (value < min) ? min : (value > max) ? max : value; } // Get temporary directory static std::string get_temp_directory() { const char* tmp = std::getenv("TEMP"); if (!tmp) tmp = std::getenv("TMP"); return tmp ? tmp : "."; } // Check whether the file exists static bool file_exists(const std::string& path) { #ifdef _WIN32 return ::_access(path.c_str(), 0) == 0; #else return ::access(path.c_str(), F_OK) == 0; #endif } }; } // namespace TTS
This is the article about C/C++ Windows SAPI implementing text to voice function. For more related C++ Windows SAPI text to voice content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!