SoFunction
Updated on 2025-04-13

Use Python to implement text to speech (TTS) and play audio

Text-to-Speech (TTS) technology is a very practical tool when developing applications involving voice interactions or requiring voice prompts. This article will explain how to convert text to voice and play audio files using Python's gTTS and playsound libraries.

What is gTTS and playsound

gTTS is a Python library based on the Google Text-to-Speech API that converts text to speech and saves it as an MP3 file. It is simple and easy to use, suitable for quickly implementing text to voice function.

playsound is a lightweight Python library for playing audio files. It supports common audio formats (such as MP3, WAV, etc.) and is cross-platform compatible.

Install the dependency library

Before you start, you need to make sure that the gTTS and playsound libraries are installed. If it has not been installed, you can use the following command:

pip install gTTS playsound

Implementation steps

Here are the complete steps to implement text-to-speech and play audio using gTTS and playsound:

1. Import the library

from gtts import
from playsound import playsound

gTTS is used to convert text to speech.

playsound is used to play the generated audio file.

2. Define text and language

text = "Hello, this is a text-to-speech conversion example."
lang = 'en'  # Language code, 'en' means English, 'zh-cn' means Chinese

text is the text content to be converted into speech.

lang is the language code, and gTTS supports multiple languages ​​(such as English, Chinese, French, etc.).

3. Generate voice and save it as MP3 file

tts = gTTS(text=text, lang=lang, slow=False)
("example.mp3")

gTTS initialization parameters:

  • text: The text to be converted.
  • lang: Language code.
  • slow: Whether to use slower speech speed (True is slow, False is normal).
  • save method: Save the generated voice as an MP3 file.

4. Play audio files

playsound("example.mp3")

playsound method: Play the audio file with the specified path.

Complete code example

Here is a complete code example:

from gTTS import gTTS
from playsound import playsound
 
# 1. Define text and languagetext = "Hello, this is a text-to-speech conversion example."
lang = 'en'
 
# 2. Generate voicetts = gTTS(text=text, lang=lang, slow=False)
("example.mp3")
 
# 3. Play voiceplaysound("example.mp3")

After running the above code, the program will:

  • Convert text to speech and save as an example.mp3 file.
  • Plays the generated MP3 file.

Things to note

1. Network connection: gTTS requires access to Google's online services, so the device needs to be connected to the Internet. If the network is unstable, it may cause the conversion to fail.

2. File path: Make sure the file path provided is correct. If you run the code on different operating systems, pay attention to the differences in path separators (Windows uses \, while macOS and Linux use /).

3. Cross-platform compatibility: Playsound may perform slightly differently on different operating systems. If you encounter problems, you can try other audio playback libraries such as pydub or pygame.

4. Error handling: In order to improve the robustness of the code, it is recommended to add exception handling to catch network errors or file operation errors. For example:

try:
    tts = gTTS(text=text, lang=lang, slow=False)
    ("example.mp3")
    playsound("example.mp3")
except Exception as e:
    print(f"An error occurred: {e}")

Extended features

1. Supports multilingual:

Multilingual support can be easily achieved by modifying lang parameters. For example:

  • English: 'en'
  • Chinese: 'zh-cn'
  • French: 'fr'
  • Spanish: 'es'

2. Clean up temporary files:

If the generated MP3

The file is only used temporarily and can be deleted after playback:

import os
 
try:
    tts = gTTS(text=text, lang=lang, slow=False)
    ("example.mp3")
    playsound("example.mp3")
finally:
    if ("example.mp3"):
        ("example.mp3")

3. User interaction:

It allows users to enter text or select language, increasing program flexibility:

text = input("Enter the text to convert to speech: ")
lang = input("Enter the language code (., 'en' for English, 'zh-cn' for Chinese): ")
tts = gTTS(text=text, lang=lang, slow=False)
("output.mp3")
playsound("output.mp3")

Summarize

With gTTS and playsound, we can quickly implement the text to voice function and play the generated audio files. These two libraries are simple and easy to use and are suitable for rapid development of prototypes or small projects. If more complex audio processing features are needed, consider using pydub, pygame, or other professional audio libraries.

This is the article about using Python to implement text to voice (TTS) and play audio. For more related Python text to voice content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!