February 5, 2025

Kokoro-82M: Install and Run Locally Fast, Small, and Free Text-To-Speech (TTS) AI Model Kokoro-82M

In this tutorial, we explain how to download, install, and run locally Kokoro-82M on a Windows computer. Kokoro is an open-weight and open-source text-to-speech model or briefly TTS model. Its main advantage is that it is lightweight, however, at the same time, it delivers comparable quality to larger models. Due to its relatively small number of parameters, it is faster and more cost-efficient than larger models. You can integrate Kokoro-82M in robotics projects. Namely, Kokoro-82M, large language, as well as other AI models can give the ability to robot express itself like a human being. For example, in a practical application, you would use this model to develop a personal AI assistant or enable a robot to communicate with humans.

In this tutorial, we will thoroughly explain all the steps you need to perform in order to run the model. The YouTube tutorial is given below.

Installation Procedure

First, make sure that you have Python installed on your system. We tested Kokoro by using Python 3.12. Probably some older versions of Python will also work. Then, just in case make sure that you have Microsoft C++ Compilers on your system. The easiest way to install them is to install Microsoft Visual Studio C++ Community Edition by using this link

https://visualstudio.microsoft.com/vs/features/cplusplus

Then, just in case, make sure that the NVIDIA CUDA Toolkit is installed on your system. You can install it from this link

https://developer.nvidia.com/cuda-toolkit

Then, you need to install espeak-ng text-to-speech synthesizer. Download the Windows binary files from this link

https://github.com/espeak-ng/espeak-ng/releases

For more details on how to do that watch the video tutorial. Then, text the installation of espeak-ng by starting a Windows command prompt and by typing

espeak-ng "This is a test"

If espeak-ng is properly installed, the text should be converted to a primitive speech. After these preliminary steps are completed, we can install Kokoro.

To do that, open a Windows command prompt, and type

cd\
mkdir kokoro
cd kokoro
python -m venv env1
env1\Scripts\activate.bat

This will create a workspace folder and create and start the Python virtual environment. Then, install the necessary libraries.

pip install kokoro
pip install soundfile

The next step is to write the Python code. The code is given below.

from kokoro import KPipeline
import soundfile as sf
# ๐Ÿ‡บ๐Ÿ‡ธ 'a' => American English, ๐Ÿ‡ฌ๐Ÿ‡ง 'b' => British English
# ๐Ÿ‡ฏ๐Ÿ‡ต 'j' => Japanese: pip install misaki[ja]
# ๐Ÿ‡จ๐Ÿ‡ณ 'z' => Mandarin Chinese: pip install misaki[zh]
pipeline = KPipeline(lang_code='a') # <= make sure lang_code matches voice

# This text is for demonstration purposes only, unseen during training
text = '''
In this tutorial, we explain how to download, install, and run locally 
Kokoro on Windows computer. Kokoro is an open-weight text to speech model 
or briefly TTS model. Its main advantages is that it is lightweight, 
however, at the same time it delivers comparably quality to larger models. 
Due to its relatively small number of parameters it is faster and 
more cost-efficient than larger models.
In this tutorial, we will thoroughly explain all the steps you 
need to perform in order to run the model. 
In a practical application, you would use this model to develop 
a personal AI assistant, or to enable a computer to communicate with humans.
'''

# af_nicole
generator = pipeline(
    text, voice='af_nicole', # 
    speed=1, split_pattern=r'\n+'
)
for i, (gs, ps, audio) in enumerate(generator):
    print(i)  # i => index
    print(gs) # gs => graphemes/text
    print(ps) # ps => phonemes
    sf.write(f'{i}.wav', audio, 24000) # save each audio file

This code will convert the text to speech. It will convert and store every sentence in an independent wav file that will be saved in the workspace folder (for more details see the YouTube tutorial). Here, we use a speech style specified by “af_nicole”. For other speech styles, see this link

https://huggingface.co/hexgrad/Kokoro-82M/tree/main/voices

and see the YouTube tutorial for the complete explanation.