# Getting Started % SPDX-FileCopyrightText: 2025-present Krys Lawrence % SPDX-License-Identifier: CC-BY-SA-4.0 % aquarion-libtts documentation © 2025-present by Krys Lawrence is licensed under % Creative Commons Attribution-ShareAlike 4.0 International. To view a copy of this % license, visit ## Installation aquarion-libtts comes in several different flavours, depending on your needs. These variations are handled by specifying extras when installing. First, there are extras for supporting various GPU platforms: - `cpu`: Include PyTorch, but only support CPU and not any GPUs. - `cu128`: Include PyTorch with CUDA 12.8 support for Nvidia GPUs. - `cu129`: Include PyTorch with CUDA 12.9 support for Nvidia GPUs. Second, each built-in TTS backend has it's own extra so that only the dependencies of the TTS plugins you want to use will be included: - `kokoro`: Include the required dependencies for Kokoro TTS. So, to install only the base package, without support for any of the built-in TTS backends, you can run something like: ```sh pip install aquarion-libtts ``` However, in order to use at least one TTS backend, you will probably want to include some extras like this, for example: ```sh pip install aquarion-libtts[cu129,kokoro] ``` Or: ```sh pip install aquarion-libtts[cpu,kokoro] ``` ## Built-In TTS Plugins aquarion-libtts provides (or will provide) built-in support for several TTS backends. They are accessed through the same plugin API as any third-party TTS backend you might also use. The following TTS backends currently have built-in support: :::{list-table} :header-rows: 1 - - Plugin ID - TTS Backend - - `kokoro_v1` - [Kokoro TTS](https://huggingface.co/hexgrad/Kokoro-82M) ::: ## Basic Usage ### Key Concepts 1. The library uses a plugin system to managed multiple TTS backends. 1. All access to the plugins, their backends and their settings are handled through the `api` package. 1. There is a plugin registry that provides access to everything else. 1. All TTS backends provide the same interface so that they can be used interchangeably. 1. Each TTS backend can have different configuration settings, however. ### Step 1: Instantiate the Registry ```python from aquarion.libs.libtts import api registry = api.TTSPluginRegistry() ``` ### Step 2: Load All Plugins ```python registry.load_plugins() ``` ### Step 3: Enable Desired Plugins All loaded plugins are disabled by default. This means that they will not show up in the list of available plugins. Plugins that you want to use should be enabled like so: ```python registry.enable("kokoro_v1") ``` Ideally, plugins should be versioned to allow different implementations over time. ### Step 4: Instantiate a Plugin ```python plugin = registry.get_plugin("kokoro_v1") ``` Plugins are containers that provide access to their TTS backends and backend-appropriate settings, as well as methods for describing the backend and it's settings in multiple languages. See the below for more details of the descriptive capabilities of plugins. ### Step 5: Instantiate Settings Each backend is expected to support fully functional default settings, in addition to customized settings. To instantiate default settings, do this: ```python settings = plugin.make_settings() ``` Or, to instantiate more customized settings, do something like this: ```python settings = plugin.make_settings(from_dict={ "voice": "af_bella" }) ``` Settings are only ever set using a dictionary, not through setting attributes directly. Also, settings objects are immutable once created, so changing settings requires creating a whole new settings instance. This is meant to facilitate the saving and loading of backend settings in a consistent way, as well as make it easier for dynamic settings UIs to be created. ### Step 6: Instantiate the TTS Backend ```python backend = plugin.make_backend(settings) ``` TTS backends always require a settings object, even if it is the default settings. Also, changing settings in an existing backend requires providing a whole new complete settings instance, since settings are immutable. ### Step 7: Start the Backend Now that we finally have our TTS backend, we need to start it: ```python backend.start() ``` Depending on the specific backend, this could start other threads or processes, or access external APIs. It could also download other resources it might need. ### Step 8: Convert Text to Speech When converting text to speech, the results are provided in chunks of audio via an iterator. This better supports streaming and real-time applications. E.g: ```python import wave with wave.open("play_me.wav", "wb") as wave_file: wave_file.setnchannels(backend.audio_spec.num_channels) wave_file.setsampwidth(backend.audio_spec.sample_width // 8) wave_file.setframerate(backend.audio_spec.sample_rate) for audio_chunk in backend.convert( "Hi there from aquarion-libtts. This is the kokoro backend." ): wave_file.writeframes(audio_chunk) ``` As you can see, the TTS backend also provides information about the returned audio format in it's `.audio_spec` attribute. ### Step 9: Stop the Backend When shutting down or switching TTS backends, it is important to always stop the backend to allow it to clean up after itself. ```python backend.stop() ``` Best practice would be to wrap your code in a `try ... finally` block to ensure the stop method is always called, even in the case of an error. ## Example See the [examples](https://github.com/aquarion-ai/aquarion-libtts/tree/main/examples) sub-directory for examples of how to use this project. ## Beyond the Basics In addition to the above core functionality, more is provided: - The plugin registry also includes methods for listing plugins, enabled or otherwise, as well as disabling plugins, and checking if a plugin is already enabled. - Each plugin also includes methods for getting it's display name in multiple languages, as well as getting details about each specific setting so that a settings UI can be constructed, also in multiple languages. (Which languages are supported depends on the plugin.) - Each settings object also include a method to export the settings as a JSON-compatible dict for storage, editing, etc. - Each backend also includes details about the audio format it emits, as well as a check for whether or not it is currently started. To learn more about these extra capabilities, please see the . ## Creating Your Own TTS Backends To learn about creating plugins for your own TTS backend for this project, see .