pyrit.prompt_converter.AzureSpeechAudioToTextConverter

pyrit.prompt_converter.AzureSpeechAudioToTextConverter#

class AzureSpeechAudioToTextConverter(azure_speech_region: str | None = None, azure_speech_key: str | None = None, recognition_language: str = 'en-US')[source]#

Bases: PromptConverter

Transcribes a .wav audio file into text using Azure AI Speech service.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text

__init__(azure_speech_region: str | None = None, azure_speech_key: str | None = None, recognition_language: str = 'en-US') → None[source]#

Initializes the converter with Azure Speech service credentials and recognition language.

Parameters:

azure_speech_region (str, Optional) – The name of the Azure region.
azure_speech_key (str, Optional) – The API key for accessing the service.
recognition_language (str) – Recognition voice language. Defaults to “en-US”. For more on supported languages, see the following link: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support

Methods

`__init__`([azure_speech_region, ...])	Initializes the converter with Azure Speech service credentials and recognition language.
`convert_async`(*, prompt[, input_type])	Converts the given audio file into its text representation.
`convert_tokens_async`(*, prompt[, ...])	Converts substrings within a prompt that are enclosed by specified start and end tokens.
`get_identifier`()	Returns an identifier dictionary for the converter.
`input_supported`(input_type)	Checks if the input type is supported by the converter.
`output_supported`(output_type)	Checks if the output type is supported by the converter.
`recognize_audio`(audio_bytes)	Recognizes audio file and returns transcribed text.
`stop_cb`(evt, recognizer)	Callback function that stops continuous recognition upon receiving an event 'evt'.
`transcript_cb`(evt, transcript)	Callback function that appends transcribed text upon receiving a "recognized" event.

Attributes

`AZURE_SPEECH_KEY_ENVIRONMENT_VARIABLE`	The API key for accessing the service.
`AZURE_SPEECH_REGION_ENVIRONMENT_VARIABLE`	The name of the Azure region.
`supported_input_types`	Returns a list of supported input types for the converter.
`supported_output_types`	Returns a list of supported output types for the converter.

AZURE_SPEECH_KEY_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_KEY'#: The API key for accessing the service.

AZURE_SPEECH_REGION_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_REGION'#: The name of the Azure region.

async convert_async(*, prompt: str, input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error'] = 'audio_path') → ConverterResult[source]#

Converts the given audio file into its text representation.

Parameters:

prompt (str) – File path to the audio file to be transcribed.
input_type (PromptDataType) – The type of input data.

Returns:

The result containing the transcribed text.

Return type:

ConverterResult

Raises:

ValueError – If the input type is not supported or if the provided file is not a .wav file.

input_supported(input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error']) → bool[source]#

Checks if the input type is supported by the converter.

Parameters:: input_type (PromptDataType) – The input type to check.
Returns:: True if the input type is supported, False otherwise.
Return type:: bool

output_supported(output_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error']) → bool[source]#

Checks if the output type is supported by the converter.

Parameters:: output_type (PromptDataType) – The output type to check.
Returns:: True if the output type is supported, False otherwise.
Return type:: bool

recognize_audio(audio_bytes: bytes) → str[source]#

Recognizes audio file and returns transcribed text.

Parameters:: audio_bytes (bytes) – Audio bytes input.
Returns:: Transcribed text.
Return type:: str

stop_cb(evt: Any, recognizer: Any) → None[source]#

Callback function that stops continuous recognition upon receiving an event ‘evt’.

Parameters:

evt (speechsdk.SpeechRecognitionEventArgs) – Event.
recognizer (speechsdk.SpeechRecognizer) – Speech recognizer object.

transcript_cb(evt: Any, transcript: list[str]) → None[source]#

Callback function that appends transcribed text upon receiving a “recognized” event.

Parameters:

evt (speechsdk.SpeechRecognitionEventArgs) – Event.
transcript (list) – List to store transcribed text.

pyrit.prompt_converter.AzureSpeechAudioToTextConverter

Contents

pyrit.prompt_converter.AzureSpeechAudioToTextConverter#