pyrit.prompt_converter.AzureSpeechTextToAudioConverter#
- class AzureSpeechTextToAudioConverter(azure_speech_region: str | None = None, azure_speech_key: str | None = None, synthesis_language: str = 'en_US', synthesis_voice_name: str = 'en-US-AvaNeural', output_format: Literal['wav', 'mp3'] = 'wav')[source]#
Bases:
PromptConverter
Generates a wave file from a text prompt using Azure AI Speech service.
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech
- __init__(azure_speech_region: str | None = None, azure_speech_key: str | None = None, synthesis_language: str = 'en_US', synthesis_voice_name: str = 'en-US-AvaNeural', output_format: Literal['wav', 'mp3'] = 'wav') None [source]#
Initializes the converter with Azure Speech service credentials, synthesis language, and voice name.
- Parameters:
azure_speech_region (str, Optional) – The name of the Azure region.
azure_speech_key (str, Optional) – The API key for accessing the service.
synthesis_language (str) – Synthesis voice language.
synthesis_voice_name (str) – Synthesis voice name, see URL. For more details see the following link for synthesis language and synthesis voice: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
filename (str) – File name to be generated. Please include either .wav or .mp3.
output_format (str) – Either wav or mp3. Must match the file prefix.
Methods
__init__
([azure_speech_region, ...])Initializes the converter with Azure Speech service credentials, synthesis language, and voice name.
convert_async
(*, prompt[, input_type])Converts the given text prompt into its audio representation.
convert_tokens_async
(*, prompt[, ...])Converts substrings within a prompt that are enclosed by specified start and end tokens.
get_identifier
()Returns an identifier dictionary for the converter.
input_supported
(input_type)Checks if the input type is supported by the converter.
output_supported
(output_type)Checks if the output type is supported by the converter.
Attributes
The API key for accessing the service.
The name of the Azure region.
Supported audio formats for output.
supported_input_types
Returns a list of supported input types for the converter.
supported_output_types
Returns a list of supported output types for the converter.
- AZURE_SPEECH_KEY_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_KEY'#
The API key for accessing the service.
- AZURE_SPEECH_REGION_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_REGION'#
The name of the Azure region.
- async convert_async(*, prompt: str, input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'error'] = 'text') ConverterResult [source]#
Converts the given text prompt into its audio representation.
- Parameters:
prompt (str) – The text prompt to be converted into audio.
input_type (PromptDataType) – The type of input data.
- Returns:
The result containing the audio file path.
- Return type:
- Raises:
ModuleNotFoundError – If the
azure.cognitiveservices.speech
module is not installed.RuntimeError – If there is an error during the speech synthesis process.
ValueError – If the input type is not supported or if the prompt is empty.
- input_supported(input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'error']) bool [source]#
Checks if the input type is supported by the converter.
- Parameters:
input_type (PromptDataType) – The input type to check.
- Returns:
True if the input type is supported, False otherwise.
- Return type:
- output_supported(output_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'error']) bool [source]#
Checks if the output type is supported by the converter.
- Parameters:
output_type (PromptDataType) – The output type to check.
- Returns:
True if the output type is supported, False otherwise.
- Return type: