# TTS Tools

Last update: Dec 12, 2024


# Introduction


  • TTS is an abbreviation of Text To Speech, an AI that converts any given text into vocal speech.

  • The ones listed here offer a decent variety of features & options, such as model training, fine-tuning, 0 shot training, or being mixed with RVC.

  • Here's an index of the best TTS tools out there:


# ElevenLabs/11Labs

  • ElevenLabs is a freemium service that offers TTS, training TTS models & translating videos from different languages.


# Fish Speech

  • Fish speech is a 0shot multilingual TTS model created by Fish Audio.

  • This is one of the best 0shot TTS as of now, it rarely hallucinates.

  • It can be used either locally or on the cloud.


# F5 TTS

  • F5 is the best 0shot TTS model.

  • F5 gives fairly high quality outputs that rarely hallucinate.

  • But it is limited with issue like: Reading to fast = you are using a reference audio that is more the six seconds long or 100 characters. Hallucinates on low voices.

  • It can be used either locally or on the cloud.


#

# Edge TTS


  • This is Microsoft Edge TTS, which is good quality, multilingual & works great on long sentences.

  • It can only be used online via their API, through their web browser, a HF/Colab space or mixed with RVC.

    1. Download the browser.

    2. Open your Notepad & paste the following code:

    <!DOCTYPE html>
    <html>
    <body style="background-color:#dddddd">
    
    <h3 aria-hidden="true">Browser TTS "Hack"</h3>
    
    <textarea rows="10" cols="50" id="ttsText" style="background-color:#eeeeee"></textarea>
    <br />
    <button aria-hidden="true" onclick="genText()"><font aria-hidden="true">Generate</font></button>
    
    <pre id="tts"></pre>
    
    <script>
    function genText() {
    var x = document.getElementById("ttsText").value;
    document.getElementById("tts").innerHTML = x;
    }
    </script>
    
    
    
    </body>
    </html>
    #
    1. Save it as “Microsoft Edge TTS.txt”

    2. Rename it to “Microsoft Edge TTS.html”

    3. Open Microsoft Edge & drag the .html to it.

    4. Use Audacity to record the audio. Set the recording mode to loopback to record the internal audio (Realtek driver might be needed).

    5. In the TTS input the text you want & click Generate. Stop recording when the voice is done.

    6. You can then select Voice Options in the toolbar & change the speed to a faster/slower speech.

#

#

# XTTS2


  • Built on 🐢 Tortoise TTS & developed by Coqui AI, which has been discontinued unfortunately.

  • Has important model changes that make cross-language 0 Shot voice cloning & multilingual speech generation super easy.

  • You need less training data. Just least a 2 minute audio.

  • Can use it either online or locally:

#

# Zonos


  • 0 Shot TTS with great emotion controls

  • Can be used with English, French, German, Chinese and Japanese

  • Can be used locally or online

#

# Kokoro-TTS


  • CLI TTS

  • Only has premade voices

  • Voice bleeding for English, British English French, Itailian, Japanese and Chinese

  • Not the best emotion control

#

# OpenVoice


  • Has Versatile Instant Voice Cloning (aka 0 Shot Training)
  • Contains cross-lingual & flexible voice style control
  • Available both locally & online:
#

# Piper


  • Fast TTS

  • Great multilingual support

  • Works for almost all languages

  • Decent quality

#

# MeloTTS


  • MeloTTS is a high-quality multilingual TTS library, made by MyShell.ai

  • Includes almost real-time inference.

  • It can be used both locally and online:

#

# GPT-SoVITS


  • GPT-SoVITS has cross language inference, but there could be some noises.

  • It's very good with Chinese, but also with English.

  • Most parts are in japanese & not deeply tested. Expect some instability.

  • Can be used both locally & online:

#

# You have reached the end.

Report Issues