구글, 네이버 TTS 내용정리

기타 기술들(Unity 관련)

구글, 네이버 TTS 내용정리

Dean83 2022. 3. 24. 22:50

* 안드로이드, 아이폰 둘다 사용가능 해야 하므로, Rest 방식을 고려.
* 음성 재생을 WWW를 통해 할경우, Cleartext HTTP traffic to localhost not permitted url 오류가 발생하므로,
Manifest.xml에 다음을 추가

<application
        android:label="@string/app_name"
        ...
        android:usesCleartextTraffic="true">

1. Google TTS
- 1년간 무료, 신청하여 key값을 받아 통신
- https://kumgo1d.tistory.com/28 (유니티에서 Rest로 구글 TTS 사용하는 방법)
- google api를 사용하기 위해선, api 키 및 OAuth를 통한 인증 토큰을 받아야만 하며, 이 토큰을 가지고 api 통신을 해야만 한다.
=> 해당 기능 구현이 복잡함
=> OAuth의 경우, 여러 방법이 있으나, google signin 과 firebase 연동같은 경우 pc 에선 동작치 않는다.
- api 키 획득 : https://console.cloud.google.com/apis
- https://developers.google.com/oauthplayground/ 에서 테스트 가능
- https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize?apix=true&apix_params=%7B%22resource%22%3A%7B%22input%22%3A%7B%22text%22%3A%22Hello.%20Nice%20to%20meet%20you.%22%7D%2C%22voice%22%3A%7B%22languageCode%22%3A%22en-US%22%2C%22name%22%3A%22en-US-Wavenet-D%22%7D%2C%22audioConfig%22%3A%7B%22audioEncoding%22%3A%22LINEAR16%22%2C%22speakingRate%22%3A1%2C%22pitch%22%3A0%7D%7D%7D 에서 TTS 웹테스트

2. 네이버 css
- 1 ~ 1000 글자당 4원, 신청하여 ket값을 받아 통신
- https://www.ncloud.com/product/aiService/css
- 사용 설명 : https://apidocs.ncloud.com/ko/ai-naver/clova_speech_synthesis/tts/
- 음성파일을 mp3로 저장 후, 불러와서 재생해야 함. byte[]로 받은 mp3를 float[]로 변환하여 오디오 클립 생성시,
클립은 pcm이어서 음성이 깨짐
- c# 에서 naver css rest 호출하는 방법 예시
* PC에서는 mp3를 재생할 수 없으므로, wav 변환후 동작해야함. (NAudio.dll 사용)
- https://mintpot.synology.me:30000/boards/2/topics/349 참조

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using System.IO;
using System.Net;
using UnityEngine.Networking;
using System;
using System.Text;
using NAudio.Wave;

public class CertiClass : CertificateHandler
{
    protected override bool ValidateCertificate(byte[] certificateData)
    {
        return true;
    }
}

public class NewBehaviourScript : MonoBehaviour
{
    public AudioSource audioSource;
    private string apiURL = "https://naveropenapi.apigw.ntruss.com/voice/v1/tts";

    public void ButtonClick()
    {
        StartCoroutine(CoConnectServer());
    }

    //private float[] ConvertByteToFloat(byte[] data)
    //{
    //    float[] floatArr = new float[data.Length / 2];
    //    for (int i = 0; i < floatArr.Length; i++)
    //    {
    //        if (BitConverter.IsLittleEndian)
    //        {
    //            Array.Reverse(data, i * 2, 2);
    //        }
    //        floatArr[i] = (float)(BitConverter.ToInt16(data, i * 2) / 32767f);
    //    }
    //    return floatArr;
    //}


    protected IEnumerator CoConnectServer(string method = "POST", int timeout = 10)
    {
        string url = apiURL;
        string text = "Texas millionaire and hunter Corey Knowlton recently paid $350,000 to kill an endangered black rhino in Namibia. There are only 4,000 to 5,000 black rhinos left in the world, and now there is one less. While it’s easy to paint Corey as a monster, he says that hunters like himself are endangered animals’ best hope for survival.";
        UnityWebRequest request = null;
        byte[] byteDataParams = Encoding.UTF8.GetBytes("speaker=clara&speed=1&text=" + text);
        request = UnityWebRequest.Put(url, byteDataParams);
        request.SetRequestHeader("Content-Type", "application/x-www-form-urlencoded");
        request.SetRequestHeader("X-NCP-APIGW-API-KEY-ID", "lvtmb6ihb4");
        request.SetRequestHeader("X-NCP-APIGW-API-KEY", "XZvpJkJPFYL9kE5HmXuF42baPWc6DN2CrOCSQLKw");
        request.timeout = timeout;
        request.method = method;
        request.certificateHandler = new CertiClass();

        yield return request.SendWebRequest();

        string filePath = Path.Combine(Application.persistentDataPath, "aa");
        string androidPrefix = "";
        AudioType audiotype = AudioType.WAV;
        string playExtension = "";
#if UNITY_ANDROID
        androidPrefix = "file://";
        audiotype = AudioType.MPEG;
        playExtension = ".mp3";
#elif !UNITY_ANDROID
        playExtension = ".wav";
#endif

        if (string.IsNullOrEmpty(request.error))
        {
            if (File.Exists(filePath + ".mp3"))
                File.Delete(filePath + ".mp3");

            if (File.Exists(filePath + ".wav"))
                File.Delete(filePath + ".wav");
            
            using (var stream = new MemoryStream(request.downloadHandler.data))
            {
                Stream output = File.OpenWrite(filePath + ".mp3");
                stream.WriteTo(output);
                output.Close();
                stream.Close();
            }

#if UNITY_STANDALONE || UNITY_EDITOR
            using (Mp3FileReader reader = new Mp3FileReader(filePath + ".mp3"))
            {
                WaveFileWriter.CreateWaveFile(filePath + ".wav", reader);
            }
#endif

            WWW www = new WWW(androidPrefix + filePath + playExtension);
            yield return www;
            audioSource.volume = 1;
            audioSource.PlayOneShot(www.GetAudioClip(false, false, audiotype));
        }
        else
        {
            string errorMessage = request.error;
            Debug.Log(errorMessage);
        }
    }
}

2. 마이크로 소프트 TTS
- SSML을 통해 좀 더 음성을 커스터마이징 가능함. (속도, 피치 등)
- STT 요금제와 동일. 무료계정 1달 무료
- 유료 계정 1달 무료 -> 1백만 자 당 4500원 정도. 요금제 정보

- https://mintpot.synology.me:30000/boards/43/topics/364 에 있는 STT 설정 과 동일하게 설정. (같은 SDK 사용)
- 지원 음성 목록 : https://docs.microsoft.com/ko-kr/azure/cognitive-services/speech-service/language-support#standard-voices
-> SpeechConfig 설정
- 언어, TTS 음성 이름 (남2 여2 지원), 키값 설정
-> SpeechSynthesizer 에 SpeechConfig 를 인자값으로 하여 생성
-> 음성으로 생성할 Text 입력 및 SpeechSynthesizer 함수인 SpeakTextAsync를 통해 통신
-> 구현 음성 재생
- 구현 예시

//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
// <code>
using UnityEngine;
using UnityEngine.UI;
using Microsoft.CognitiveServices.Speech;

public class HelloWorld : MonoBehaviour
{
    // Hook up the three properties below with a Text, InputField and Button object in your UI.
    public Text outputText;
    public InputField inputField;
    public Button speakButton;
    public AudioSource audioSource;

    private object threadLocker = new object();
    private bool waitingForSpeak;
    private string message;

    private SpeechConfig speechConfig;
    private SpeechSynthesizer synthesizer;

    public void ButtonClick()
    {
        lock (threadLocker)
        {
            waitingForSpeak = true;
        }

        string newMessage = string.Empty;

        // Starts speech synthesis, and returns after a single utterance is synthesized.
        using (var result = synthesizer.SpeakTextAsync(inputField.text).Result)
        {
            // Checks result.
            if (result.Reason == ResultReason.SynthesizingAudioCompleted)
            {
                // Native playback is not supported on Unity yet (currently only supported on Windows/Linux Desktop).
                // Use the Unity API to play audio here as a short term solution.
                // Native playback support will be added in the future release.
                var sampleCount = result.AudioData.Length / 2;
                var audioData = new float[sampleCount];
                for (var i = 0; i < sampleCount; ++i)
                {
                    audioData[i] = (short)(result.AudioData[i * 2 + 1] << 8 | result.AudioData[i * 2]) / 32768.0F;
                }

                // The output audio format is 16K 16bit mono
                var audioClip = AudioClip.Create("SynthesizedAudio", sampleCount, 1, 16000, false);
                audioClip.SetData(audioData, 0);
                audioSource.clip = audioClip;
                audioSource.Play();

                newMessage = "Speech synthesis succeeded!";
            }
            else if (result.Reason == ResultReason.Canceled)
            {
                var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
                newMessage = $"CANCELED:\nReason=[{cancellation.Reason}]\nErrorDetails=[{cancellation.ErrorDetails}]\nDid you update the subscription info?";
            }
        }

        lock (threadLocker)
        {
            message = newMessage;
            waitingForSpeak = false;
        }
    }

    void Start()
    {
        if (outputText == null)
        {
            UnityEngine.Debug.LogError("outputText property is null! Assign a UI Text element to it.");
        }
        else if (inputField == null)
        {
            message = "inputField property is null! Assign a UI InputField element to it.";
            UnityEngine.Debug.LogError(message);
        }
        else if (speakButton == null)
        {
            message = "speakButton property is null! Assign a UI Button to it.";
            UnityEngine.Debug.LogError(message);
        }
        else
        {
            // Continue with normal initialization, Text, InputField and Button objects are present.
            inputField.text = "Enter text you wish spoken here.";
            message = "Click button to synthesize speech";
            speakButton.onClick.AddListener(ButtonClick);

            // Creates an instance of a speech config with specified subscription key and service region.
            // Replace with your own subscription key and service region (e.g., "westus").
            speechConfig = SpeechConfig.FromSubscription("49b020847c33407197061fee21f33df8", "westus");
            speechConfig.SpeechSynthesisLanguage = "en-US";
            speechConfig.SpeechSynthesisVoiceName = "en-US-Guy24kRUS";
            // The default format is Riff16Khz16BitMonoPcm.
            // We are playing the audio in memory as audio clip, which doesn't require riff header.
            // So we need to set the format to Raw16Khz16BitMonoPcm.
            speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Raw16Khz16BitMonoPcm);

            // Creates a speech synthesizer.
            // Make sure to dispose the synthesizer after use!
            synthesizer = new SpeechSynthesizer(speechConfig, null);
        }
    }

    void Update()
    {
        lock (threadLocker)
        {
            if (speakButton != null)
            {
                speakButton.interactable = !waitingForSpeak;
            }

            if (outputText != null)
            {
                outputText.text = message;
            }
        }
    }

    void OnDestroy()
    {
        synthesizer.Dispose();
    }
}
// </code>

'기타 기술들(Unity 관련)' 카테고리의 다른 글

인앱 결제 테스트 관련 정보 (0)	2022.03.24
구글 API 및 OAuth 정보 (0)	2022.03.24
Git 서브모듈 관련 사용법 (0)	2022.03.24
갤럭시 워치 앱 개발 관련 내용 (0)	2022.03.24
아쿠아라이더 VR 프로젝트 결과 (0)	2022.03.24

현재글구글, 네이버 TTS 내용정리

티스토리챌린지, 오블완, unity #이미지표시,

Today :
Yesterday :

Dean83