pyrit.prompt_converter.AzureSpeechAudioToTextConverter#
- class AzureSpeechAudioToTextConverter(azure_speech_region: str | None = None, azure_speech_key: str | None = None, azure_speech_resource_id: str | None = None, use_entra_auth: bool = False, recognition_language: str = 'en-US')[source]#
Bases:
PromptConverterTranscribes a .wav audio file into text using Azure AI Speech service.
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
- __init__(azure_speech_region: str | None = None, azure_speech_key: str | None = None, azure_speech_resource_id: str | None = None, use_entra_auth: bool = False, recognition_language: str = 'en-US') None[source]#
Initialize the converter with Azure Speech service credentials and recognition language.
- Parameters:
azure_speech_region (str, Optional) – The name of the Azure region.
azure_speech_key (str, Optional) – The API key for accessing the service (if not using Entra ID auth).
azure_speech_resource_id (str, Optional) – The resource ID for accessing the service when using Entra ID auth. This can be found by selecting ‘Properties’ in the ‘Resource Management’ section of your Azure Speech resource in the Azure portal.
use_entra_auth (bool) – Whether to use Entra ID authentication. If True, azure_speech_resource_id must be provided. If False, azure_speech_key must be provided. Defaults to False.
recognition_language (str) – Recognition voice language. Defaults to “en-US”. For more on supported languages, see the following link: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
- Raises:
ValueError – If the required environment variables are not set, if azure_speech_key is passed in when use_entra_auth is True, or if azure_speech_resource_id is passed in when use_entra_auth is False.
Methods
__init__([azure_speech_region, ...])Initialize the converter with Azure Speech service credentials and recognition language.
convert_async(*, prompt[, input_type])Convert the given audio file into its text representation.
convert_tokens_async(*, prompt[, ...])Convert substrings within a prompt that are enclosed by specified start and end tokens.
get_identifier()Get the typed identifier for this object.
input_supported(input_type)Check if the input type is supported by the converter.
output_supported(output_type)Check if the output type is supported by the converter.
recognize_audio(audio_bytes)Recognize audio file and return transcribed text.
stop_cb(evt, recognizer)Stop continuous recognition upon receiving an event 'evt'.
transcript_cb(evt, transcript)Append transcribed text upon receiving a "recognized" event.
Attributes
The API key for accessing the service.
The name of the Azure region.
The resource ID for accessing the service when using Entra ID auth.
Tuple of input modalities supported by this converter.
Tuple of output modalities supported by this converter.
supported_input_typesReturns a list of supported input types for the converter.
supported_output_typesReturns a list of supported output types for the converter.
- AZURE_SPEECH_KEY_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_KEY'#
The API key for accessing the service.
- AZURE_SPEECH_REGION_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_REGION'#
The name of the Azure region.
- AZURE_SPEECH_RESOURCE_ID_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_RESOURCE_ID'#
The resource ID for accessing the service when using Entra ID auth.
- SUPPORTED_INPUT_TYPES: tuple[Literal['text', 'image_path', 'audio_path', 'video_path', 'binary_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output'], ...] = ('audio_path',)#
Tuple of input modalities supported by this converter. Subclasses must override this.
- SUPPORTED_OUTPUT_TYPES: tuple[Literal['text', 'image_path', 'audio_path', 'video_path', 'binary_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output'], ...] = ('text',)#
Tuple of output modalities supported by this converter. Subclasses must override this.
- async convert_async(*, prompt: str, input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'binary_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output'] = 'audio_path') ConverterResult[source]#
Convert the given audio file into its text representation.
- Parameters:
prompt (str) – File path to the audio file to be transcribed.
input_type (PromptDataType) – The type of input data.
- Returns:
The result containing the transcribed text.
- Return type:
- Raises:
ValueError – If the input type is not supported or if the provided file is not a .wav file.
- recognize_audio(audio_bytes: bytes) str[source]#
Recognize audio file and return transcribed text.
- Parameters:
audio_bytes (bytes) – Audio bytes input.
- Returns:
Transcribed text.
- Return type:
- Raises:
ModuleNotFoundError – If the azure.cognitiveservices.speech module is not installed.
- stop_cb(evt: Any, recognizer: Any) None[source]#
Stop continuous recognition upon receiving an event ‘evt’.
- Parameters:
evt (speechsdk.SpeechRecognitionEventArgs) – Event.
recognizer (speechsdk.SpeechRecognizer) – Speech recognizer object.
- Raises:
ModuleNotFoundError – If the azure.cognitiveservices.speech module is not installed.