Adding Custom TTS Plugins
Overview
aquarion-libtts provides all it’s TTS backends through a plugin system. This also means that you can create your own 3rd-party TTS plugins in your own external packages and they will also be loaded and usable within aquarion-libtts.
To create a TTS backend plugin, the following components are required:
An implementation of the
ITTSSettingsprotocol,An implementation of the
ITTSBackendprotocol,An implementation of the
ITTSPluginprotocol,A
tts_hookimpldecorated function to register your plugin, andAn
aquarion-libttsentry point in yourpyproject.tomlfile.
Additionally, since multi-lingual support is expected by default, you should also
include some gettext catalogues and design for internationalization and localization
from your first release. The load_language() function
can help with that. … Though, this is not strictly required, it is best practice.
How the Plugin System Works
When you call the
load_plugins()method, all installed packages are searched foraquarion-libttsentry points.Each found entry point is then searched for
tts_hookimplfunctions.Each found
tts_hookimplfunction is then called and is expected to either return an instance of anITTSPluginimplementation, orNoneif no plugin is to be registered. E.g. There are missing dependencies, incompatible hardware, etc.All the returned plugin instances are then registered in the
TTSPluginRegistryfor potential use.Note: They are all in a disabled state to start. Meaning they are not listed as available plugins until each desired plugin is explicitly enabled. This allows for controlling the list of available plugins independently of which ones are installed.
Through the registry, plugins can be retrieved and used.
The Entry Point
aquarion-libtts TTS plugins are found by searching installed packages for PEP 621-style entry points, or more accurately entry-points as defined in the pyproject.toml specification.
Specifically, add something like this to your pyproject.toml file:
[project.entry-points.'aquarion-libtts']
my_plugin_v1 = "package.hook"
Where:
- ‘aquarion-libtts’
Is the entry point group name for all aquarion-libtts entry points.
- my_plugin
Is the unique identifier key for your plugin. E.g.
kokoro- _v1
Is the major version of your plugin. This is so that old implementations and new ones can exist at the same time for backward compatibility.
- package.hook
Is the module in your package that contains your
tts_hookimpldecorated plugin registration function.
The Registration Function
aquarion-libtts plugins are registered by creating a hook function that:
Is decorated by the
tts_hookimpldecorator.Returns an instance of an
ITTSPluginimplementation orNoneto skip registering anything.
For example:
from aquarion.libs.libtts.api import ITTSPlugin, tts_hookimpl
@tts_hookimpl
def register_my_tts_plugin() -> ITTSPlugin | None:
"""Return an instance of my TTS plugin if the dependencies are installed."""
# NOTE: It is important that we do not import our plugin class or related packages
# at module import time.
# This hook needs to be able to run even when our required dependencies, etc.
# are not installed.
try:
import dependency
except ModuleNotFoundError:
return None
from package.plugin import MyTTSPlugin
return MyTTSPlugin()
The Plugin
An aquarion-libtts TTS plugin is responsible for creating, configuring and describing a TTS backend in a consistent way. Specifically, it is responsible for:
Creating TTS backend-specific settings objects,
Creating TTS backend objects themselves,
Providing multi-lingual metadata for UI presentation such as:
The plugin/backend’s display name,
Specifications for all attribute in the settings object, such as type, valid values, and/or minimum and maximum valid range of values, where applicable.
Display names for each attribute in the settings object.
Descriptions for each attribute in the settings object.
To fulfil these requirements, a TTS plugin must implement the
ITTSPlugin protocol.
The Settings
On the one hand, each TTS backend likely needs it’s own custom settings, unique to the specific backend. Also, TTS backend settings could be different for different locales. On the other hand, the settings objects for all TTS backends need to be exportable / savable / transmittable in a standardized way.
To fulfil these requirements, a TTS settings object must implement the
ITTSSettings protocol. But in addition to that, it
can have additional custom attributes.
Also, while it is the responsibility of the
make_settings() factory method to validate
any settings object it creates, it is also reasonable for
ITTSSettings implementations to validate themselves
on creation, if desired. Though this not not strictly required.
Lastly, the make_settings() factory method
must also support returning a fully functional TTS settings object with all default
values when called with no arguments. So, it is also not unreasonable for
ITTSSettings implementations to also include fully
functional default values on instantiation. Though this not not strictly required.
The Backend
Finally, there is the TTS backend object itself. This is the main object. It is responsible for converting text input in to an audio stream output. It is also responsible for reporting the kind of audio it produces (e.g. raw PCM, WAVE, MP3, OGG, VP8, stereo, mono, 8-bit, 16-bit, etc.). Client software is expected to support whatever formats their chosen supported backends produce.
To fulfil these requirements, a TTS backend object must implement the
ITTSBackend protocol.
TTS backends work on the concept that they need to be started before being used, and
then stopped when they are no longer required. So, it is recommended to wrap the the
call to start() in a try ... finally block
to ensure that stop() always gets called on
shutdown.
Additionally, it would not be unreasonable for a TTS backend to access external APIs or download additional resources once started, however, those activities should not be done on instantiation, only on startup.
Putting It All Together
So, in summary:
Client software creates a plugin registry.
The registry finds your entry point and calls your registration function to collect your plugin.
Client software enables your plugin.
Client software fetches your plugin from the registry.
Client software asks your plugin to make your settings object.
Client software asks your plugin to make your backend object using the above settings.
Client software starts your backend.
Client software uses your backend to convert text to speech.
Client software stops your backend.
Client software optionally can also ask your plugin about various metadata.
Client software optionally can save and load dictionaries of preferred settings values.
Client software optionally can re-configure your backend with new a settings object as created by your plugin.
That’s it. By following this pattern, various diverse TTS systems can be implemented, installed and used in various client software designs.