aquarion.libs.libtts.api
Public API for aquarion-libtts.
All interaction with aquarion-libtts is generally expected to go through this API package.
Example
registry = TTSPluginRegistry()
registry.load_plugins()
registry.enable("kokoro_v1")
plugin = registry.get_plugin("kokoro_v1")
settings = plugin.make_settings()
backend = plugin.make_backend(settings)
try:
backend.start()
audio_chunks = []
for audio_chunk in :
audio_chunks.append(audio_chunk)
finally:
backend.stop()
Functions
|
Return a |
|
Decorate a function with this to mark it as a TTS plugin registration hook. |
Classes
PathLikes are hashable, but this makes it explicit for the type checker. |
|
|
Traversables are hashable, but this makes it explicit for the type checker. |
|
Common interface for all TTS backends. |
|
Common interface for all TTS Plugins. |
|
Common interface for all TTS backend settings. |
|
Common interface for objects that accept and contain |
|
Audio metadata about the audio format that an |
Registry of all aquarion-libtts backend plugins. |
|
|
The byte order for multi-byte audio samples. |
|
The data type of a single audio sample. |
|
An specification entry describing one setting in an ITTSSettings object. |
- class aquarion.libs.libtts.api.HashablePathLike
Bases:
Hashable,PathLike[str]PathLikes are hashable, but this makes it explicit for the type checker.
- class aquarion.libs.libtts.api.HashableTraversable(*args, **kwargs)
Bases:
Hashable,TraversableTraversables are hashable, but this makes it explicit for the type checker.
- class aquarion.libs.libtts.api.ITTSBackend(*args, **kwargs)
Bases:
ITTSSettingsHolder,ProtocolCommon interface for all TTS backends.
An ITTSBackend is responsible for converting text in to speech audio stream chunks. To do this, it should first be started with
start(), thenconvert()can be used to do any number of conversions, and finally it should be shut down withstop()when no longer needed.An ITTSBackend is also responsible for reporting the kind of audio that it produces (e.g. raw PCM, WAVE, MP3, OGG, VP8, stereo, mono, 8-bit, 16-bit, etc.). This is reported via the
audio_specattribute.Lastly, since each ITTSBackend is also an
ITTSSettingsHolder, then it must also accept configuration settings. These are commonly provided at instantiation, but that is not strictly required to conform to theITTSSettingsHolderprotocol.- property audio_spec: TTSAudioSpec
Metadata about the speech audio format.
E.g. Mono 16-bit little-endian linear PCM audio at 24KHz.
This should be read-only.
- convert(text: str) Iterator[bytes]
Return speech audio for the given text as one or more binary chunks.
- Parameters:
text – The text to convert in to speech.
- Returns:
An
Iteratorof chunks of audio in the format specified byaudio_spec.
- property is_started: bool
True if TTS backend is started, False otherwise.
This should be read-only.
- class aquarion.libs.libtts.api.ITTSPlugin(*args, **kwargs)
Bases:
ProtocolCommon interface for all TTS Plugins.
- get_display_name(locale: str) str
Return the display name for the plugin, appropriate for the given locale.
A display name is one that is human-friendly as opposed to any kind of unique key that code would care about.
- Parameters:
locale –
The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like
en_CA,zh-Hant,ca-ES-valencia, or evende_DE.UTF-8@euro. It can be as general asfror as specific aslanguage_territory_script_variant@modifier.Plugins are expected to to do their best to accommodate the given locale, but can fall back to more a general language variant. E.g. from
en_CAtoen.- Returns:
The display name of the plugin in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.
- get_setting_description(setting_name: str, locale: str) str
Return the given setting’s description, appropriate for the given locale.
- Parameters:
setting_name – The name of the setting as returned from
get_settings_spec()mapping keys.locale –
The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like
en_CA,zh-Hant,ca-ES-valencia, or evende_DE.UTF-8@euro. It can be as general asfror as specific aslanguage_territory_script_variant@modifier.Plugins are expected to to do their best to accommodate the given locale, but can fall back to more a general language variant. E.g. from
en_CAtoen.
- Returns:
The display name of the setting in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.
- Raises:
KeyError or AttributeError – If the given setting name is not a recognized setting.
- get_setting_display_name(setting_name: str, locale: str) str
Return the given setting’s display name, appropriate for the given locale.
A display name is one that is human-friendly as opposed to any kind of unique key that code would care about.
- Parameters:
setting_name – The name of the setting as returned from
get_settings_spec()mapping keys.locale –
The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like
en_CA,zh-Hant,ca-ES-valencia, or evende_DE.UTF-8@euro. It can be as general asfror as specific aslanguage_territory_script_variant@modifier.Plugins are expected to to do their best to accommodate the given locale, but can fall back to more a general language variant. E.g. from
en_CAtoen.
- Returns:
The display name of the setting in a language appropriate for the given locale. If the given locale is not supported at all, then the plugin is expected to return a display name in it’s default language, or English if that is preferred.
- Raises:
KeyError or AttributeError – If the given setting name is not a recognized setting.
- get_settings_spec() Mapping[str, TTSSettingsSpecEntry[TTSSettingsSpecEntryTypes]]
Return a specification that describes all the backend’s settings.
- Returns:
An immutable mapping of from setting attribute name to
TTSSettingsSpecEntryinstances.Implementations should probably return a
MappingProxyTypeto achieve the immutability.
- get_supported_locales() AbstractSet[str]
Return the set of locales supported by the TTS backend for speaking.
This should also be the locales that the plugin supports for display names, setting names, setting descriptions, etc.
Locales can be in either POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) formats, and client applications are expected to support both.
- Returns:
An immutable set of locale strings.
Example
frozenset({"fr_CA", "ca-ES-valencia", "zh-Hant"})
Note
The set of locales should as be specific as is directly supported and should not include broader / more general or approximate catch-all locales unless they are also explicitly supported, or nothing more specific is supported. I.e.
en_CAis good,enis bad, unlessenis as specific as the TTS backend supports. Or ifca-ES-valenciais supported, then that is preferred overca-ES. … In short, be as precise and honest as you can.
- property id: str
A unique identifier for the plugin.
The id must be unique across all Aquarion libtts plugins. Also, it is recommended to include at least a major version number as a suffix so that multiple versions / implementations of a plugin can be installed and supported simultaneously. E.g. for backwards compatibility.
This should be read-only.
Example
kokoro_v1
- make_backend(settings: ITTSSettings) ITTSBackend
Create and return a TTS backend instance.
This is a factory method.
- Parameters:
settings – Custom or default settings must be provided to configure the TTS backend.
- Returns:
A configured and ready to use TTS backend.
- Raises:
TypeError – Implementations of this interface must check that they are getting their own
ITTSSettingsimplementation and should raise an exception if any other plugin’sITTSSettingsis given instead.
- make_settings(from_dict: Mapping[str, JSONSerializableTypes] | None = None) ITTSSettings
Create and return an appropriate settings object for the TTS backend.
This is a factory method.
- Parameters:
from_dict –
If it is not None, then the given values should be used to initialize the settings.
If it is None, then default values for all settings should be used.
- Returns:
An instance of a compatible
ITTSSettingsimplementation with all settings values valid for immediate use.- Raises:
KeyError, ValueError or TypeError – This function is expected to validate it’s inputs. If any setting is invalid for the concrete implementation of
ITTSSettingsthat the factory will create, then an exception should be raised.
- class aquarion.libs.libtts.api.ITTSSettings(*args, **kwargs)
Bases:
ProtocolCommon interface for all TTS backend settings.
Implementations of this interface are expected to add their own setting attributes for the specific
ITTSBackendimplementation they go with.Note: There is no expectation that ITTSSettings implementations be immutable or hashable, but it’s probably a good idea since changes to settings should be done by calling
ITTSPlugin.make_settings()with a changed settings dictionary.Example
class MySettings: locale: str = "en" voice: str = "bella" speed: float = 1.0 api_key: str cache_path: Path def __eq__(self, other: object) -> bool: # Your implementation here def to_dict(self) -> dict[str, JSONSerializableTypes]: # Your implementation here
- __eq__(other: object) bool
Return True if all settings values match, False otherwise.
- Parameters:
other – The other
ITTSSettingsinstance to compare against.- Returns:
Trueifotheris an instance of the same concrete implementation ofITTSSettingsand all the settings values are the same. False otherwise.
- locale: str
The locale should be a POSIX-compliant (i.e. using underscores) or CLDR-compliant (i.e. using hyphens) locale string like
en_CA,zh-Hant,ca-ES-valencia, or evende_DE.UTF-8@euro. It can be as general asfror as specific aslanguage_territory_script_variant@modifier.
- to_dict() dict[str, JSONSerializableTypes]
Export all settings as a dictionary of only JSON-serializable types.
- Returns:
A dictionary where the keys are the setting names and the values are the setting values converted as necessary to simple base JSON-compatible types.
Example
{ "locale": "en", "voice": "bella", "speed": 1.0, "api_key": "Your API key here", "cache_path": "Cache path converted to a basic string" }
- class aquarion.libs.libtts.api.ITTSSettingsHolder(*args, **kwargs)
Bases:
ProtocolCommon interface for objects that accept and contain
ITTSSettings.- get_settings() ITTSSettings
Return the current setting in use.
- Returns:
The current settings in use.
Note
The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an
ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.
- update_settings(new_settings: ITTSSettings) None
Update to the new given settings.
- Parameters:
new_settings – The new complete set of settings to start using immediately.
- Raises:
TypeError – Implementations of this interface should check that they are only getting the correct concrete settings class and raise an exception if any other kind of
ITTSSettingsis given.
Note
The reason the settings are not just direct attributes is because they are to be treated as an all-or-nothing collection. I.e. individual settings attributes should not be individually modified directly on an
ITTSSettingsHolder, but rather the whole settings object should be replaced with a new one.
- class aquarion.libs.libtts.api.TTSAudioSpec(*, format: str, sample_rate: int, sample_type: TTSSampleTypes, sample_width: int, byte_order: TTSSampleByteOrders, num_channels: int)
Bases:
objectAudio metadata about the audio format that an
ITTSBackendreturns.Note: Instances of this class are immutable once created.
- byte_order: TTSSampleByteOrders
E.g. Little Endian or Big Endian.
- sample_type: TTSSampleTypes
E.g. Signed Integer, Unsigned Integer or Floating Point.
- class aquarion.libs.libtts.api.TTSPluginRegistry
Bases:
objectRegistry of all aquarion-libtts backend plugins.
TTS backends and everything related to them are created / accessed through
ITTSPlugininstances. The plugin registry is responsible for finding, loading, listing, enabling, disabling and giving access to those plugins.- disable(plugin_id: str) None
Disable a TTS plugin for inclusion in
list_plugin_ids().- Parameters:
plugin_id – The ID of the desired plugin.
- Raises:
ValueError – If the given ID does not match any registered plugin.
Note
Disabling a plugin does not affect any existing instances of that plugin in any way. So, proper TTS backend instance management and stopping must still be handled separately.
- enable(plugin_id: str) None
Enable a TTS plugin for inclusion in
list_plugin_ids().The idea behind enabled vs disabled plugins is that it allows one to manage which plugins are listed / displayed to a user, independently of all the plugins that are installed / loaded. I.e. It allows for filtering which plugins one wants exposed and which should be kept hidden. E.g. Some plugins could be not supported by your application, even thought they got installed with some other dependency.
- Parameters:
plugin_id – The ID of the desired plugin.
- Raises:
ValueError – If the given ID does not match any registered plugin.
- get_plugin(id_: str) ITTSPlugin
Return the plugin the for the given ID.
- Parameters:
id_ – The ID of the desired already loaded plugin. E.g.
kokoro_v1.- Raises:
ValueError – If the given ID does not match any registered plugin.
- list_plugin_ids(*, only_disabled: bool = False, list_all: bool = False) set[str]
Return the set of plugin IDs.
By default, only enabled plugins are listed.
- Parameters:
- Raises:
ValueError – If both arguments are
True.
- load_plugins(*, validate: bool = True) None
Load all aquarion-libtts backend plugins.
Plugins are discovered by searching for pyproject.toml entry points named aquarion-libtts, then searching those entry points for hook functions decorated with
@tts_hookimpl, and finally calling those hook functions. The plugins returned by those hook functions are then stored in the plugin registry and made accessible.Note
All plugins are disabled by default. Use
enable()to enable a plugin.- Parameters:
validate – If
True(the default), then an exception is raised if any hook functions do not conform to expected hook specification.- Raises:
PluginValidationError – If
validateis True and a hook function does not conform to the expected specification.
Examples
[project.entry-points.'aquarion-libtts'] my_plugin_v1 = "package.hook"
@tts_hookimpl def register_my_tts_plugin() -> ITTSPlugin | None: from package.plugin import MyTTSPlugin return MyTTSPlugin()
- class aquarion.libs.libtts.api.TTSSampleByteOrders(*values)
Bases:
StrEnumThe byte order for multi-byte audio samples.
The string values of these types match FFmpeg’s format descriptions.
- BIG_ENDIAN = 'be'
Big endian byte order
This means the most significant byte is stored first, then the least significant byte after that.
- LITTLE_ENDIAN = 'le'
Little endian byte order
This means the least significant byte is stored first, then the most significant byte after that.
- NOT_APPLICABLE = ''
Not Applicable
This should only be used for 8-bit (i.e. single byte) samples.
- class aquarion.libs.libtts.api.TTSSampleTypes(*values)
Bases:
StrEnumThe data type of a single audio sample.
The string values of these types match FFmpeg’s format descriptions.
- FLOAT = 'f'
Floating point samples.
- SIGNED_INT = 's'
Signed integer samples. (I.e. positive and negative numbers allowed.)
- UNSIGNED_INT = 'u'
Unsigned integer samples. (I.e. only positive numbers, but wider sample space.)
- class aquarion.libs.libtts.api.TTSSettingsSpecEntry(*, type: type[T], min: int | float | None = None, max: int | float | None = None, values: frozenset[T] | None = None)
Bases:
GenericAn specification entry describing one setting in an ITTSSettings object.
Since
ITTSSettingscan contain custom TTS backend specific setting attributes, there is a need for a way to describe those setting attributes in a standardized way so that settings UIs can be constructed dynamically in applications that use aquarion-libtts. Instances of this class, in a dictionary, for example, can provide a specification for how to render settings fields in a UI.Instances of this class are immutable once created.
Example
spec = { "locale": TTSSettingSpecEntry( type=str, min=2, values=frozenset("en", "fr") ), "voice": TTSSettingSpecEntry(type=str), "speed": TTSSettingSpecEntry(type=float, min=0.1, max=1.0), "api_key": TTSSettingSpecEntry(type=str), "cache_path": TTSSettingSpecEntry(type=str), }
With the example above, one could imagine a UI with multiple text box fields.
localecould be a dropdown or a set of radio buttons. There could be validation for valid ranges.speedcould have up and down arrow buttons to increase and decrease the value, and / or react to a mouse’s scroll wheel. Etc.- max: int | float | None = None
The maximum allowed value or maximum allowed length.
This is optional.
For strings this is the maximum allowed length of the string.
For numeric types, this is the maximum allowed value.
- min: int | float | None = None
The minimum allowed value or minimum allowed length.
This is optional.
For strings this is the minimum allowed length of the string.
For numeric types, this is the minimum allowed value.
- aquarion.libs.libtts.api.load_language(locale: str, domain: str, locale_path: HashablePathLike | HashableTraversable | str) LoadLanguageReturnType
Return a
gettext_()function and a*Translationsinstance.- Parameters:
locale –
The desired locale to find and load. E.g.
en_CAor fr`, etc.localemust be parsable by the Babel package and will be normalized by it as well.localeis generally expected to be in POSIX format (i.e. using underscores) but CLDR format (i.e. using hyphens) is also supported and will be converted to POSIX format automatically for the purpose of finding translation catalogues.If an exact match on locale cannot be found, less specific fallback locales well be used instead. E.g. if
kk_Cyrl_KZis not found, thenkk_Cyrlwill be tried, and then justkk.If no matching locale is found, then the gettext methods will just return the hard coded strings from the source file.
domain –
A name unique to your app / project. This domain name becomes the file name of your message catalogues and templates. For example you you could your project’s name or your root package’s name. E.g.
my-cool-project.Note
Do not use
aquarion-libttsas your domain name. That is reserved for this project.locale_path –
The base path where your language files can be found. This can be a regular path (as a
stror aPath) or this could be some path inside your own Python package, retrieved with the help ofimportlib.resources.files(), for example.Note
It is recommended that third-party TTS plugins keep their translation files inside their package (i.e. wheel) by using
importlib.resources.files()to access a locale directory.
- Returns:
A
tupleof (agettext()callable, aGNUTranslationsinstance).The
gettextcallable is provided for easy use of the more common action.The
*Translationsinstance provides access to all the other, less common translation capabilities one might need, e.g.ngettext,pgettext, etc.Attention
It is common practice to name the
gettextcallable_, so that extracting and retrieving translated messages is as easy is_("text to be translated"). In fact, if you use Babel this will be expected by default for translatable strings to be found.- Raises:
various – If an invalid locale is given various possible exceptions can be raised. See Babel package’s
babel.core.Locale.parse()for details..
Example
from importlib.resources import files from typing import cast from aquarion.libs.libtts.api import HashableTraversable locale_path = cast(HashableTraversable, files(__name__) / "locale") _, t = load_language( "fr_CA", domain="my-cool-project", locale_path=locale_path ) print(_("I will be translated"))
Note
Once loaded, the language translations are cached for the duration of the process.
- aquarion.libs.libtts.api.tts_hookimpl(**kwargs: Any) Callable[[], ITTSPlugin | None]
Decorate a function with this to mark it as a TTS plugin registration hook.
This is a decorator.
The decorated function is expected to accept no arguments and to return an
ITTSPlugin, orNoneif no plugin is to be registered. E.g. Missing dependencies, incompatible hardware, etc.For more detailed usage options, see the Pluggy package.
- Parameters:
kwargs – Any keyword arguments supported by Pluggy.
- Returns:
The decorated function, but marked as a TTS plugin registration hook.
Example
@tts_hookimpl def register_my_tts_plugin() -> ITTSPlugin | None: # NOTE: It is important that we do not import our plugin class or # related packages at module import time. # This hook needs to be able to run even when our required # dependencies, etc. are not installed. try: import dependency except ModuleNotFoundError: return None from package.plugin import MyTTSPlugin return MyTTSPlugin()