Kokoro-82M – Install Locally and Run on Linux Ubuntu – Best Lightweight Text-to-Speech AI Model

admin

1 year ago

In this tutorial, we explain how to download, install, and run locally Kokoro-82M on a Linux Ubuntu computer. Kokoro is an open-weight and open-source text-to-speech model or briefly TTS model. Its main advantage is that it is lightweight, however, at the same time, it delivers comparable quality to larger models. Due to its relatively small number of parameters, it is faster and more cost-efficient than larger models. You can integrate Kokoro-82M in robotics projects. Namely, Kokoro-82M, large language, as well as other AI models can give the ability to robot express itself like a human being. For example, in a practical application, you would use this model to develop a personal AI assistant or enable a robot to communicate with humans.

The YouTube tutorial is given below.

Installation Procedure for Kokoro on Linux Ubuntu

We are using Linux Ubuntu 24.04. First open a terminal and type

sudo apt-get update && sudo apt-get upgrade 
sudo apt-get install espeak-ng

Then, verify that you have Python installed on your computer by typing

which python3

This command should return the path of the Python interpreter file. Next, verify your Python version

python3 --version

In our case, we are using Python 3.12. Next, let us create workspace folder and create/activate Python virtual environments

sudo apt install python3.12-venv

cd ~
mkdir kokoro
cd kokoro
python3 -m venv env1
source env1/bin/activate

Next, install the necessary libraries

pip install kokoro
pip install soundfile

The next step is to write the Python code. The code is given below.

from kokoro import KPipeline
import soundfile as sf
# 🇺🇸 'a' => American English, 🇬🇧 'b' => British English
# 🇯🇵 'j' => Japanese: pip install misaki[ja]
# 🇨🇳 'z' => Mandarin Chinese: pip install misaki[zh]
pipeline = KPipeline(lang_code='a') # <= make sure lang_code matches voice

# This text is for demonstration purposes only, unseen during training
text = '''
In this tutorial, we explain how to download, install, and run locally 
Kokoro on Windows computer. Kokoro is an open-weight text to speech model 
or briefly TTS model. Its main advantages is that it is lightweight, 
however, at the same time it delivers comparably quality to larger models. 
Due to its relatively small number of parameters it is faster and 
more cost-efficient than larger models.
In this tutorial, we will thoroughly explain all the steps you 
need to perform in order to run the model. 
In a practical application, you would use this model to develop 
a personal AI assistant, or to enable a computer to communicate with humans.
'''

# af_nicole
generator = pipeline(
    text, voice='af_nicole', # 
    speed=1, split_pattern=r'\n+'
)
for i, (gs, ps, audio) in enumerate(generator):
    print(i)  # i => index
    print(gs) # gs => graphemes/text
    print(ps) # ps => phonemes
    sf.write(f'{i}.wav', audio, 24000) # save each audio file

This code will convert the text to speech. It will convert and store every sentence in an independent wav file that will be saved in the workspace folder (for more details see the YouTube tutorial). Here, we use a speech style specified by “af_nicole”. For other speech styles, see this link

https://huggingface.co/hexgrad/Kokoro-82M/tree/main/voices

and see the YouTube tutorial for the complete explanation.