SoFunction
Updated on 2024-10-29

Writing a speech synthesis system based on Python

contexts

I've always been interested in speech synthesis systems, and always wanted to be able to synthesize a little bit of content for myself, such as synthesizing novels, broadcasting my downloaded eBooks to me, etc.

speech synthesis system

In fact, it is a tool based on speech synthesis, but this thing is due to many manufacturers have provided the form of API, so the development difficulty is greatly reduced, only need to call a few API can realize their own speech synthesis tool; sparrow is small, all the organs are complete. On a larger scale, this is a small speech synthesis system.

preliminary

First we need to have the

  • Anaconda
  • Python 3.7
  • visual studio code

move

Here we useXunfei (telecommunications company)WebAPI interface for open platforms.

First we go to the console and create an application

Once created, click on the app to access it and have a detailed section for that app.

Click on Speech Synthesis on the left, then go to the next level Online Speech Synthesis (streaming version)

On the upper right side, we need to get 3 things:

  • APPID
  • APISecret
  • APIKey

code implementation

Okay, let's move on to the code implementation. First, we'll install the two libraries we need.

pip install websocket-client
pip install playsound

Next we define a class play, containing 4 functions

class play:
  def __init__(self): # Initialization functions
  def play_sound(self):#Play Audio Functions
  def select_vcn(self,*arg):#Select drop-down box to set the pronouncer
  def xfyun_tts(self):#Perform speech synthesis

Here, you need to fill in the appid, appkey, and appsecret that you just obtained from the Xunfei Open Platform console.

def __init__(self):
        self.APP_ID = 'xxx'   #Please fill in your appid
        self.API_KEY = 'xxx'  #Please fill in your appkey
        self.SECRET_KEY = 'xxx' #Please fill in your appsecret

        =() # Initialization window
        ("Speech Synthesis System") #Window name
        ("600x550") #Set the window size
        (0,0)
        #(width=True,height=True)#Set the window to be variable, width is not variable, height is variable, default is True.
        =(,text='Please select the voice pronouncer')#hashtag
        =(,width=77,height=30) #Multi-line text boxes
        =(, width=12)  #DownListBox
        # Set the content of the drop-down list box
        ['values']=("Sweet Female Voice - Xiao Yan","Kindly Male Voice - Hsu Ku.","Knowledgeable Female Voice - Pimmi", "Lovely Children's Voices - Xiao Bao Xu","Kindly Female Voice - Julia Jr.")
        (0)    # Set the current selection status to 0, the first item.
        ("<<ComboboxSelected>>", self.select_vcn)
        self.tk_tts_file=(,text='Generate filename')
        self.b1=(, text='Perform speech synthesis', width=10,height=1,command=self.xfyun_tts) #buttons
        self.tk_play=(, text='Play', width=10,height=1,command=self.play_sound) #buttons
        # Location of individual components
        self.tk_tts_file.place(x=30,y=500)
        self.(x=300,y=500)
        self.tk_play.place(x=400,y=500)
        (x=30,y=30)
        (x=154,y=30)

        (x=30,y=60)
        ()

When the drop-down list is selected, set the corresponding pronouncer

def select_vcn(self,*arg):
        if ()=='Sweet female voice - Xiao Yan':
            ="xiaoyan"
        elif ()=='Intimate male voice - Xu Long':
            ="aisjiuxu"
        elif ()=='Intelligent Female Voice - Pim':
            ="aisxping"
        elif ()=='Lovely Children's Voice - Xu Xiaobao':
            ="aisbabyxu"
        elif ()=='Kind Female Voice - Julia Jr.':
            ="aisjinger"

        print()

Next, we're going to modify the Python demo that comes with Xunfei to make it easier to use

# -*- coding:utf-8 -*-
#
#   author: iflytek
#
# This demo was tested on Windows + Python 3.7.
# The third-party libraries and their versions that were installed when this demo ran successfully are as follows:
#   cffi==1.12.3
#   gevent==1.4.0
#   greenlet==0.4.15
#   pycparser==2.19
#   six==1.12.0
#   websocket==0.2.1
#   websocket-client==0.56.0
# Synthesis of minor languages requires transmission of minor text, use of minor speakers vcn, tte=unicode, and modification of text encoding methods
# Error code link: /document/error-code (must see when code returns an error code)
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
import websocket
import datetime
import hashlib
import base64
import hmac
import json
from  import urlencode
import time
import ssl
from  import format_date_time
from datetime import datetime
from time import mktime
import _thread as thread
import os
import wave


STATUS_FIRST_FRAME = 0  # The identification of the first frame
STATUS_CONTINUE_FRAME = 1  # Intermediate frame identification
STATUS_LAST_FRAME = 2  # Marking of the last frame

PCM_PATH = "./"

class Ws_Param(object):
    # Initialization
    def __init__(self):
        pass
    def set_tts_params(self, text, vcn):
            if text != "":
                 = text
            if vcn != "":
                 = vcn
                # Business parameters (business), more personalized parameters can be viewed on the official website
                 = {"bgs":1,"aue": "raw", "auf": "audio/L16;rate=16000", "vcn": , "tte": "utf8"}
            # Use the following method for small languages, where unicode refers to the encoding method of the small end of utf16, i.e. "UTF-16LE"".
            # = {"status": 2, "text": str(base64.b64encode(('utf-16')), "UTF8")}
             = {"status": 2, "text": str(base64.b64encode(('utf-8')), "UTF8")}

    def set_params(self, appid, apiSecret, apiKey):
        if appid != "":
             = appid
            # Public parameters (common)
             = {"app_id": }
        
        if apiKey != "":
             = apiKey
        
        if apiSecret != "":
             = apiSecret

    # generate url
    def create_url(self):
        url = 'wss:///v2/tts'
        # Generate timestamps in RFC1123 format
        now = ()
        date = format_date_time(mktime(()))

        # Splicing strings
        signature_origin = "host: " + "" + "\n"
        signature_origin += "date: " + date + "\n"
        signature_origin += "GET " + "/v2/tts " + "HTTP/1.1"
        # Perform hmac-sha256 encryption.
        signature_sha = (('utf-8'), signature_origin.encode('utf-8'),
                                 digestmod=hashlib.sha256).digest()
        signature_sha = base64.b64encode(signature_sha).decode(encoding='utf-8')

        authorization_origin = "api_key=\"%s\", algorithm=\"%s\", headers=\"%s\", signature=\"%s\"" % (
            , "hmac-sha256", "host date request-line", signature_sha)
        authorization = base64.b64encode(authorization_origin.encode('utf-8')).decode(encoding='utf-8')
        # Combine requested authentication parameters into a dictionary
        v = {
            "authorization": authorization,
            "date": date,
            "host": ""
        }
 
        url = url + '?' + urlencode(v)
 
        return url

def on_message(ws, message):
    try:
        #print(message)
        try:
            message =(message)
        except Exception as e:
            print("111",e)

        code = message["code"]
        sid = message["sid"]
        audio = message["data"]["audio"]
        audio = base64.b64decode(audio)
        status = message["data"]["status"]
        print(code, sid, status)
        if status == 2:
            print("ws is closed")
            ()
        if code != 0:
            errMsg = message["message"]
            print("sid:%s call error:%s code is:%s" % (sid, errMsg, code))
        else:
            with open(PCM_PATH, 'ab') as f:
                (audio)

    except Exception as e:
        print("receive msg,but parse exception:", e)

# Handling of websocket errors received
def on_error(ws, error):
    print("### error:", error)


# Handling of websocket closures received
def on_close(ws):
    print("### closed ###")


# Handling of received websocket connection establishment
def on_open(ws):
    def run(*args):
        d = {"common": ,
             "business": ,
             "data": ,
             }
        d = (d)
        print("------> start sending text data")
        (d)
        if (PCM_PATH):
            (PCM_PATH)

    thread.start_new_thread(run, ())


def text2pcm(appid, apiSecret, apiKey, text, vcn, fname):
    wsParam.set_params(appid, apiSecret, apiKey)
    wsParam.set_tts_params(text, vcn)
    (False)
    wsUrl = wsParam.create_url()
    ws = (wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
    ws.on_open = on_open
    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

    pcm2wav(PCM_PATH, fname)

def pcm2wav(fname, dstname):
    with open(fname, 'rb') as pcmfile:
        pcmdata = ()
        print(len(pcmdata))
    with (dstname, "wb") as wavfile:
        ((1, 2, 16000, 0, 'NONE', 'NONE'))
        (pcmdata)

wsParam = Ws_Param()

Eventually a speech synthesis system was realized.

Currently, a variety of cloud computing, cloud services are rapidly developing, the major companies provide a wealth of resources, greatly reducing the threshold of artificial intelligence development, do not need to understand the principles of speech synthesis, it is surprising that you can quickly develop a speech synthesis tool out!

To this article on the Python based on the writing of a speech synthesis system is introduced to this article, more related Python speech synthesis content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!