In this tutorial, we explain how to download, install, and run locally Kokoro-82M on a Windows computer. Kokoro is an open-weight and open-source text-to-speech model or briefly TTS model. Its main advantage is that it is lightweight, however, at the same time, it delivers comparable quality to larger models. Due to its relatively small number of parameters, it is faster and more cost-efficient than larger models. You can integrate Kokoro-82M in robotics projects. Namely, Kokoro-82M, large language, as well as other AI models can give the ability to robot express itself like a human being. For example, in a practical application, you would use this model to develop a personal AI assistant or enable a robot to communicate with humans.
In this tutorial, we will thoroughly explain all the steps you need to perform in order to run the model. The YouTube tutorial is given below.
Installation Procedure
First, make sure that you have Python installed on your system. We tested Kokoro by using Python 3.12. Probably some older versions of Python will also work. Then, just in case make sure that you have Microsoft C++ Compilers on your system. The easiest way to install them is to install Microsoft Visual Studio C++ Community Edition by using this link
https://visualstudio.microsoft.com/vs/features/cplusplus
Then, just in case, make sure that the NVIDIA CUDA Toolkit is installed on your system. You can install it from this link
https://developer.nvidia.com/cuda-toolkit
Then, you need to install espeak-ng text-to-speech synthesizer. Download the Windows binary files from this link
https://github.com/espeak-ng/espeak-ng/releases
For more details on how to do that watch the video tutorial. Then, text the installation of espeak-ng by starting a Windows command prompt and by typing
espeak-ng "This is a test"
If espeak-ng is properly installed, the text should be converted to a primitive speech. After these preliminary steps are completed, we can install Kokoro.
To do that, open a Windows command prompt, and type
cd\
mkdir kokoro
cd kokoro
python -m venv env1
env1\Scripts\activate.bat
This will create a workspace folder and create and start the Python virtual environment. Then, install the necessary libraries.
pip install kokoro
pip install soundfile
The next step is to write the Python code. The code is given below.
from kokoro import KPipeline
import soundfile as sf
# ๐บ๐ธ 'a' => American English, ๐ฌ๐ง 'b' => British English
# ๐ฏ๐ต 'j' => Japanese: pip install misaki[ja]
# ๐จ๐ณ 'z' => Mandarin Chinese: pip install misaki[zh]
pipeline = KPipeline(lang_code='a') # <= make sure lang_code matches voice
# This text is for demonstration purposes only, unseen during training
text = '''
In this tutorial, we explain how to download, install, and run locally
Kokoro on Windows computer. Kokoro is an open-weight text to speech model
or briefly TTS model. Its main advantages is that it is lightweight,
however, at the same time it delivers comparably quality to larger models.
Due to its relatively small number of parameters it is faster and
more cost-efficient than larger models.
In this tutorial, we will thoroughly explain all the steps you
need to perform in order to run the model.
In a practical application, you would use this model to develop
a personal AI assistant, or to enable a computer to communicate with humans.
'''
# af_nicole
generator = pipeline(
text, voice='af_nicole', #
speed=1, split_pattern=r'\n+'
)
for i, (gs, ps, audio) in enumerate(generator):
print(i) # i => index
print(gs) # gs => graphemes/text
print(ps) # ps => phonemes
sf.write(f'{i}.wav', audio, 24000) # save each audio file
This code will convert the text to speech. It will convert and store every sentence in an independent wav file that will be saved in the workspace folder (for more details see the YouTube tutorial). Here, we use a speech style specified by “af_nicole”. For other speech styles, see this link
https://huggingface.co/hexgrad/Kokoro-82M/tree/main/voices
and see the YouTube tutorial for the complete explanation.