aquarion.libs.libtts.api

Public API for aquarion-libtts.

All interaction with aquarion-libtts is generally expected to go through this API package.

Example

registry = TTSPluginRegistry()
registry.load_plugins()
registry.enable("kokoro_v1")
plugin = registry.get_plugin("kokoro_v1")
settings = plugin.make_settings()
backend = plugin.make_backend(settings)
try:
    backend.start()
    audio_chunks = []
    for audio_chunk in :
        audio_chunks.append(audio_chunk)
finally:
    backend.stop()

Functions

load_language(locale, domain, locale_path)

Return a gettext _() function and a *Translations instance.

tts_hookimpl(**kwargs)

Decorate a function with this to mark it as a TTS plugin registration hook.

Classes

HashablePathLike()

PathLikes are hashable, but this makes it explicit for the type checker.

HashableTraversable(*args, **kwargs)

Traversables are hashable, but this makes it explicit for the type checker.

ITTSBackend(*args, **kwargs)

Common interface for all TTS backends.

ITTSPlugin(*args, **kwargs)

Common interface for all TTS Plugins.

ITTSSettings(*args, **kwargs)

Common interface for all TTS backend settings.

ITTSSettingsHolder(*args, **kwargs)

Common interface for objects that accept and contain ITTSSettings.

TTSAudioSpec(*, format, sample_rate, ...)

Audio metadata about the audio format that an ITTSBackend returns.

TTSPluginRegistry()

Registry of all aquarion-libtts backend plugins.

TTSSampleByteOrders(*values)

The byte order for multi-byte audio samples.

TTSSampleTypes(*values)

The data type of a single audio sample.

TTSSettingsSpecEntry(*, type[, min, max, values])

An specification entry describing one setting in an ITTSSettings object.

class aquarion.libs.libtts.api.HashablePathLike

Bases: Hashable, PathLike[str]

PathLikes are hashable, but this makes it explicit for the type checker.

class aquarion.libs.libtts.api.HashableTraversable(*args, **kwargs)

Bases: Hashable, Traversable

Traversables are hashable, but this makes it explicit for the type checker.

class aquarion.libs.libtts.api.ITTSBackend(*args, **kwargs)

Bases: ITTSSettingsHolder, Protocol

Common interface for all TTS backends.

An ITTSBackend is responsible for converting text in to speech audio stream chunks. To do this, it should first be started with start(), then convert() can be used to do any number of conversions, and finally it should be shut down with stop() when no longer needed.

An ITTSBackend is also responsible for reporting the kind of audio that it produces (e.g. raw PCM, WAVE, MP3, OGG, VP8, stereo, mono, 8-bit, 16-bit, etc.). This is reported via the audio_spec attribute.

Lastly, since each ITTSBackend is also an ITTSSettingsHolder, then it must also accept configuration settings. These are commonly provided at instantiation, but that is not strictly required to conform to the ITTSSettingsHolder protocol.

property audio_spec: TTSAudioSpec

Metadata about the speech audio format.

E.g. Mono 16-bit little-endian linear PCM audio at 24KHz.

This should be read-only.

convert(text: str) Iterator[bytes]

Return speech audio for the given text as one or more binary chunks.

Parameters:

text – The text to convert in to speech.

Returns:

An Iterator of chunks of audio in the format specified by audio_spec.

property is_started: bool

True if TTS backend is started, False otherwise.

This should be read-only.

start() None

Start the TTS backend.

If the backend is already started, this method should be idempotent and do nothing.

stop() None

Stop the TTS backend.

If the backend is already started, this method should be idempotent and do nothing.

class aquarion.libs.libtts.api.ITTSPlugin(*args, **kwargs)

Bases: Protocol

Common interface for all TTS Plugins.

get_display_name(locale: str) str

Return the display name for the plugin, appropriate for the given locale.

A display name is one that is human-friendly as opposed to any kind of unique key that code would care about.

Parameters:

locale

The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like en_CA, zh-Hant, ca-ES-valencia, or even de_DE.UTF-8@euro. It can be as general as fr or as specific as language_territory_script_variant@modifier.

Plugins are expected to to do their best to accommodate the given locale, but can fall back to more a general language variant. E.g. from en_CA to en.

Returns:

The display name of the plugin in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.

get_setting_description(setting_name: str, locale: str) str

Return the given setting’s description, appropriate for the given locale.

Parameters:
  • setting_name – The name of the setting as returned from get_settings_spec() mapping keys.

  • locale

    The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like en_CA, zh-Hant, ca-ES-valencia, or even de_DE.UTF-8@euro. It can be as general as fr or as specific as language_territory_script_variant@modifier.

    Plugins are expected to to do their best to accommodate the given locale, but can fall back to more a general language variant. E.g. from en_CA to en.

Returns:

The display name of the setting in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.

Raises:

KeyError or AttributeError – If the given setting name is not a recognized setting.

get_setting_display_name(setting_name: str, locale: str) str

Return the given setting’s display name, appropriate for the given locale.

A display name is one that is human-friendly as opposed to any kind of unique key that code would care about.

Parameters:
  • setting_name – The name of the setting as returned from get_settings_spec() mapping keys.

  • locale

    The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like en_CA, zh-Hant, ca-ES-valencia, or even de_DE.UTF-8@euro. It can be as general as fr or as specific as language_territory_script_variant@modifier.

    Plugins are expected to to do their best to accommodate the given locale, but can fall back to more a general language variant. E.g. from en_CA to en.

Returns:

The display name of the setting in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.

Raises:

KeyError or AttributeError – If the given setting name is not a recognized setting.

get_settings_spec() Mapping[str, TTSSettingsSpecEntry[TTSSettingsSpecEntryTypes]]

Return a specification that describes all the backend’s settings.

Returns:

An immutable mapping of from setting attribute name to TTSSettingsSpecEntry instances.

Implementations should probably return a MappingProxyType to achieve the immutability.

get_supported_locales() AbstractSet[str]

Return the set of locales supported by the TTS backend for speaking.

This should also be the locales that the plugin supports for display names, setting names, setting descriptions, etc.

Locales can be in either POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) formats, and client applications are expected to support both.

Returns:

An immutable set of locale strings.

Example

frozenset({"fr_CA", "ca-ES-valencia", "zh-Hant"})

Note

The set of locales should as be specific as is directly supported and should not include broader / more general or approximate catch-all locales unless they are also explicitly supported, or nothing more specific is supported. I.e. en_CA is good, en is bad, unless en is as specific as the TTS backend supports. Or if ca-ES-valencia is supported, then that is preferred over ca-ES. … In short, be as precise and honest as you can.

property id: str

A unique identifier for the plugin.

The id must be unique across all Aquarion libtts plugins. Also, it is recommended to include at least a major version number as a suffix so that multiple versions / implementations of a plugin can be installed and supported simultaneously. E.g. for backwards compatibility.

This should be read-only.

Example

kokoro_v1

make_backend(settings: ITTSSettings) ITTSBackend

Create and return a TTS backend instance.

This is a factory method.

Parameters:

settings – Custom or default settings must be provided to configure the TTS backend.

Returns:

A configured and ready to use TTS backend.

Raises:

TypeError – Implementations of this interface must check that they are getting their own ITTSSettings implementation and should raise an exception if any other plugin’s ITTSSettings is given instead.

make_settings(from_dict: Mapping[str, JSONSerializableTypes] | None = None) ITTSSettings

Create and return an appropriate settings object for the TTS backend.

This is a factory method.

Parameters:

from_dict

If it is not None, then the given values should be used to initialize the settings.

If it is None, then default values for all settings should be used.

Returns:

An instance of a compatible ITTSSettings implementation with all settings values valid for immediate use.

Raises:

KeyError, ValueError or TypeError – This function is expected to validate it’s inputs. If any setting is invalid for the concrete implementation of ITTSSettings that the factory will create, then an exception should be raised.

class aquarion.libs.libtts.api.ITTSSettings(*args, **kwargs)

Bases: Protocol

Common interface for all TTS backend settings.

Implementations of this interface are expected to add their own setting attributes for the specific ITTSBackend implementation they go with.

Note: There is no expectation that ITTSSettings implementations be immutable or hashable, but it’s probably a good idea since changes to settings should be done by calling ITTSPlugin.make_settings() with a changed settings dictionary.

Example

class MySettings:
    locale: str = "en"
    voice: str = "bella"
    speed: float = 1.0
    api_key: str
    cache_path: Path

    def __eq__(self, other: object) -> bool:
        # Your implementation here

    def to_dict(self) -> dict[str, JSONSerializableTypes]:
        # Your implementation here
__eq__(other: object) bool

Return True if all settings values match, False otherwise.

Parameters:

other – The other ITTSSettings instance to compare against.

Returns:

True if other is an instance of the same concrete implementation of ITTSSettings and all the settings values are the same. False otherwise.

locale: str

The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like en_CA, zh-Hant, ca-ES-valencia, or even de_DE.UTF-8@euro. It can be as general as fr or as specific as language_territory_script_variant@modifier.

to_dict() dict[str, JSONSerializableTypes]

Export all settings as a dictionary of only JSON-serializable types.

Returns:

A dictionary where the keys are the setting names and the values are the setting values converted as necessary to simple base JSON-compatible types.

Example

{
    "locale": "en",
    "voice": "bella",
    "speed": 1.0,
    "api_key": "Your API key here",
    "cache_path": "Cache path converted to a basic string"
}
class aquarion.libs.libtts.api.ITTSSettingsHolder(*args, **kwargs)

Bases: Protocol

Common interface for objects that accept and contain ITTSSettings.

get_settings() ITTSSettings

Return the current setting in use.

Returns:

The current settings in use.

Note

The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.

update_settings(new_settings: ITTSSettings) None

Update to the new given settings.

Parameters:

new_settings – The new complete set of settings to start using immediately.

Raises:

TypeError – Implementations of this interface should check that they are only getting the correct concrete settings class and raise an exception if any other kind of ITTSSettings is given.

Note

The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.

class aquarion.libs.libtts.api.TTSAudioSpec(*, format: str, sample_rate: int, sample_type: TTSSampleTypes, sample_width: int, byte_order: TTSSampleByteOrders, num_channels: int)

Bases: object

Audio metadata about the audio format that an ITTSBackend returns.

Note: Instances of this class are immutable once created.

byte_order: TTSSampleByteOrders

E.g. Little Endian or Big Endian.

format: str

E.g. “Linear PCM”, “WAV”, “MP3”, etc.

num_channels: int

E.g. 1 for mono, 2 for stereo, etc.

sample_rate: int

E.g 8000, 24000, 48000, etc.

sample_type: TTSSampleTypes

E.g. Signed Integer, Unsigned Integer or Floating Point.

sample_width: int

E.g. 8 for 8-bit, 12 for 12-bit, 16 for 16-bit, etc.

class aquarion.libs.libtts.api.TTSPluginRegistry

Bases: object

Registry of all aquarion-libtts backend plugins.

TTS backends and everything related to them are created / accessed through ITTSPlugin instances. The plugin registry is responsible for finding, loading, listing, enabling, disabling and giving access to those plugins.

disable(plugin_id: str) None

Disable a TTS plugin for inclusion in list_plugin_ids().

Parameters:

plugin_id – The ID of the desired plugin.

Raises:

ValueError – If the given ID does not match any registered plugin.

Note

Disabling a plugin does not affect any existing instances of that plugin in any way. So, proper TTS backend instance management and stopping must still be handled separately.

enable(plugin_id: str) None

Enable a TTS plugin for inclusion in list_plugin_ids().

The idea behind enabled vs disabled plugins is that it allows one to manage which plugins are listed / displayed to a user, independently of all the plugins that are installed / loaded. I.e. It allows for filtering which plugins one wants exposed and which should be kept hidden. E.g. Some plugins could be not supported by your application, even thought they got installed with some other dependency.

Parameters:

plugin_id – The ID of the desired plugin.

Raises:

ValueError – If the given ID does not match any registered plugin.

get_plugin(id_: str) ITTSPlugin

Return the plugin the for the given ID.

Parameters:

id_ – The ID of the desired already loaded plugin. E.g. kokoro_v1.

Raises:

ValueError – If the given ID does not match any registered plugin.

is_enabled(plugin_id: str) bool

Return True if the plugin is enabled, False otherwise.

Parameters:

plugin_id – The ID of the plugin in question.

Returns:

True if the plugin is enabled, False otherwise.

list_plugin_ids(*, only_disabled: bool = False, list_all: bool = False) set[str]

Return the set of plugin IDs.

By default, only enabled plugins are listed.

Parameters:
  • only_disabled – If this is True, then only the disabled plugins are listed.

  • list_all – If this is True, then all plugins are listed, regardless of their enabled/disabled status.

Raises:

ValueError – If both arguments are True.

load_plugins(*, validate: bool = True) None

Load all aquarion-libtts backend plugins.

Plugins are discovered by searching for pyproject.toml entry points named aquarion-libtts, then searching those entry points for hook functions decorated with @tts_hookimpl, and finally calling those hook functions. The plugins returned by those hook functions are then stored in the plugin registry and made accessible.

Note

All plugins are disabled by default. Use enable() to enable a plugin.

Parameters:

validate – If True (the default), then an exception is raised if any hook functions do not conform to expected hook specification.

Raises:

PluginValidationError – If validate is True and a hook function does not conform to the expected specification.

Examples

[project.entry-points.'aquarion-libtts']
my_plugin_v1 = "package.hook"
@tts_hookimpl
def register_my_tts_plugin() -> ITTSPlugin | None:
    from package.plugin import MyTTSPlugin
    return MyTTSPlugin()
class aquarion.libs.libtts.api.TTSSampleByteOrders(*values)

Bases: StrEnum

The byte order for multi-byte audio samples.

The string values of these types match FFmpeg’s format descriptions.

BIG_ENDIAN = 'be'

Big endian byte order

This means the most significant byte is stored first, then the least significant byte after that.

LITTLE_ENDIAN = 'le'

Little endian byte order

This means the least significant byte is stored first, then the most significant byte after that.

NOT_APPLICABLE = ''

Not Applicable

This should only be used for 8-bit (i.e. single byte) samples.

class aquarion.libs.libtts.api.TTSSampleTypes(*values)

Bases: StrEnum

The data type of a single audio sample.

The string values of these types match FFmpeg’s format descriptions.

FLOAT = 'f'

Floating point samples.

SIGNED_INT = 's'

Signed integer samples. (I.e. positive and negative numbers allowed.)

UNSIGNED_INT = 'u'

Unsigned integer samples. (I.e. only positive numbers, but wider sample space.)

class aquarion.libs.libtts.api.TTSSettingsSpecEntry(*, type: type[T], min: int | float | None = None, max: int | float | None = None, values: frozenset[T] | None = None)

Bases: Generic

An specification entry describing one setting in an ITTSSettings object.

Since ITTSSettings can contain custom TTS backend specific setting attributes, there is a need for a way to describe those setting attributes in a standardized way so that settings UIs can be constructed dynamically in applications that use aquarion-libtts. Instances of this class, in a dictionary, for example, can provide a specification for how to render settings fields in a UI.

Instances of this class are immutable once created.

Example

spec = {
    "locale": TTSSettingSpecEntry(
        type=str,
        min=2,
        values=frozenset("en", "fr")
    ),
    "voice": TTSSettingSpecEntry(type=str),
    "speed": TTSSettingSpecEntry(type=float, min=0.1, max=1.0),
    "api_key": TTSSettingSpecEntry(type=str),
    "cache_path": TTSSettingSpecEntry(type=str),
}

With the example above, one could imagine a UI with multiple text box fields. locale could be a dropdown or a set of radio buttons. There could be validation for valid ranges. speed could have up and down arrow buttons to increase and decrease the value, and / or react to a mouse’s scroll wheel. Etc.

max: int | float | None = None

The maximum allowed value or maximum allowed length.

This is optional.

For strings this is the maximum allowed length of the string.

For numeric types, this is the maximum allowed value.

min: int | float | None = None

The minimum allowed value or minimum allowed length.

This is optional.

For strings this is the minimum allowed length of the string.

For numeric types, this is the minimum allowed value.

type: type[T]

The type of setting it is.

This is required.

Currently supported types: str, int and float only.

This should be set to the actual type class, not a string name of a type.

Also, only Python basic types should be used. I.e. not classes like Path or Decimal, etc.

values: frozenset[T] | None = None

The set of specific allowed values.

This is optional.

Some fields might only accept a restricted set of specific valid values. Think enumerations. Acceptable values can be specified with this attribute.

aquarion.libs.libtts.api.load_language(locale: str, domain: str, locale_path: HashablePathLike | HashableTraversable | str) LoadLanguageReturnType

Return a gettext _() function and a *Translations instance.

Parameters:
  • locale

    The desired locale to find and load. E.g. en_CA or fr`, etc.

    locale must be parsable by the Babel package and will be normalized by it as well.

    locale is generally expected to be in POSIX format (i.e. using underscores) but CLDR format (i.e. using hyphens) is also supported and will be converted to POSIX format automatically for the purpose of finding translation catalogues.

    If an exact match on locale cannot be found, less specific fallback locales well be used instead. E.g. if kk_Cyrl_KZ is not found, then kk_Cyrl will be tried, and then just kk.

    If no matching locale is found, then the gettext methods will just return the hard coded strings from the source file.

  • domain

    A name unique to your app / project. This domain name becomes the file name of your message catalogues and templates. For example you you could your project’s name or your root package’s name. E.g. my-cool-project.

    Note

    Do not use aquarion-libtts as your domain name. That is reserved for this project.

  • locale_path

    The base path where your language files can be found. This can be a regular path (as a str or a Path) or this could be some path inside your own Python package, retrieved with the help of importlib.resources.files(), for example.

    Note

    It is recommended that third-party TTS plugins keep their translation files inside their package (i.e. wheel) by using importlib.resources.files() to access a locale directory.

Returns:

A tuple of (a gettext() callable, a GNUTranslations instance).

The gettext callable is provided for easy use of the more common action.

The *Translations instance provides access to all the other, less common translation capabilities one might need, e.g. ngettext, pgettext, etc.

Attention

It is common practice to name the gettext callable _, so that extracting and retrieving translated messages is as easy is _("text to be translated"). In fact, if you use Babel this will be expected by default for translatable strings to be found.

Raises:

various – If an invalid locale is given various possible exceptions can be raised. See Babel package’s babel.core.Locale.parse() for details..

Example

from importlib.resources import files
from typing import cast

from aquarion.libs.libtts.api import HashableTraversable

locale_path = cast(HashableTraversable, files(__name__) / "locale")
_, t = load_language(
    "fr_CA",
    domain="my-cool-project",
    locale_path=locale_path
)
print(_("I will be translated"))

Note

Once loaded, the language translations are cached for the duration of the process.

aquarion.libs.libtts.api.tts_hookimpl(**kwargs: Any) Callable[[], ITTSPlugin | None]

Decorate a function with this to mark it as a TTS plugin registration hook.

This is a decorator.

The decorated function is expected to accept no arguments and to return an ITTSPlugin, or None if no plugin is to be registered. E.g. Missing dependencies, incompatible hardware, etc.

For more detailed usage options, see the Pluggy package.

Parameters:

kwargs – Any keyword arguments supported by Pluggy.

Returns:

The decorated function, but marked as a TTS plugin registration hook.

Example

@tts_hookimpl
def register_my_tts_plugin() -> ITTSPlugin | None:
    # NOTE: It is important that we do not import our plugin class or
    #       related packages at module import time.
    #       This hook needs to be able to run even when our required
    #       dependencies, etc. are not installed.
    try:
        import dependency
    except ModuleNotFoundError:
        return None
    from package.plugin import MyTTSPlugin

    return MyTTSPlugin()