pyrit.prompt_converter.AzureSpeechTextToAudioConverter#
- class AzureSpeechTextToAudioConverter(azure_speech_region: str | None = None, azure_speech_key: str | None = None, azure_speech_resource_id: str | None = None, use_entra_auth: bool = False, synthesis_language: str = 'en_US', synthesis_voice_name: str = 'en-US-AvaNeural', output_format: Literal['wav', 'mp3'] = 'wav')[source]#
Bases:
PromptConverterGenerates a wave file from a text prompt using Azure AI Speech service.
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech
- __init__(azure_speech_region: str | None = None, azure_speech_key: str | None = None, azure_speech_resource_id: str | None = None, use_entra_auth: bool = False, synthesis_language: str = 'en_US', synthesis_voice_name: str = 'en-US-AvaNeural', output_format: Literal['wav', 'mp3'] = 'wav') None[source]#
Initializes the converter with Azure Speech service credentials, synthesis language, and voice name.
- Parameters:
azure_speech_region (str, Optional) – The name of the Azure region.
azure_speech_key (str, Optional) – The API key for accessing the service (only if you’re not using Entra authentication).
azure_speech_resource_id (str, Optional) – The resource ID for accessing the service when using Entra ID auth. This can be found by selecting ‘Properties’ in the ‘Resource Management’ section of your Azure Speech resource in the Azure portal.
use_entra_auth (bool) – Whether to use Entra ID authentication. If True, azure_speech_resource_id must be provided. If False, azure_speech_key must be provided. Defaults to False.
synthesis_language (str) – Synthesis voice language.
synthesis_voice_name (str) – Synthesis voice name, see URL. For more details see the following link for synthesis language and synthesis voice: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
filename (str) – File name to be generated. Please include either .wav or .mp3.
output_format (str) – Either wav or mp3. Must match the file prefix.
- Raises:
ValueError – If the required environment variables are not set, if azure_speech_key is passed in when use_entra_auth is True, or if azure_speech_resource_id is passed in when use_entra_auth is False.
Methods
__init__([azure_speech_region, ...])Initializes the converter with Azure Speech service credentials, synthesis language, and voice name.
convert_async(*, prompt[, input_type])Converts the given text prompt into its audio representation.
convert_tokens_async(*, prompt[, ...])Converts substrings within a prompt that are enclosed by specified start and end tokens.
get_identifier()Returns an identifier dictionary for the converter.
input_supported(input_type)Checks if the input type is supported by the converter.
output_supported(output_type)Checks if the output type is supported by the converter.
Attributes
The API key for accessing the service.
The name of the Azure region.
The resource ID for accessing the service when using Entra ID auth.
Supported audio formats for output.
supported_input_typesReturns a list of supported input types for the converter.
supported_output_typesReturns a list of supported output types for the converter.
- AZURE_SPEECH_KEY_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_KEY'#
The API key for accessing the service.
- AZURE_SPEECH_REGION_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_REGION'#
The name of the Azure region.
- AZURE_SPEECH_RESOURCE_ID_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_RESOURCE_ID'#
The resource ID for accessing the service when using Entra ID auth.
- async convert_async(*, prompt: str, input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output'] = 'text') ConverterResult[source]#
Converts the given text prompt into its audio representation.
- Parameters:
prompt (str) – The text prompt to be converted into audio.
input_type (PromptDataType) – The type of input data.
- Returns:
The result containing the audio file path.
- Return type:
- Raises:
ModuleNotFoundError – If the
azure.cognitiveservices.speechmodule is not installed.RuntimeError – If there is an error during the speech synthesis process.
ValueError – If the input type is not supported or if the prompt is empty.
- input_supported(input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output']) bool[source]#
Checks if the input type is supported by the converter.
- Parameters:
input_type (PromptDataType) – The input type to check.
- Returns:
True if the input type is supported, False otherwise.
- Return type:
- output_supported(output_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output']) bool[source]#
Checks if the output type is supported by the converter.
- Parameters:
output_type (PromptDataType) – The output type to check.
- Returns:
True if the output type is supported, False otherwise.
- Return type: