pyrit.prompt_converter.AzureSpeechAudioToTextConverter#

class AzureSpeechAudioToTextConverter(azure_speech_region: str | None = None, azure_speech_key: str | None = None, recognition_language: str = 'en-US')[source]#

Bases: PromptConverter

Transcribes a .wav audio file into text using Azure AI Speech service.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text

__init__(azure_speech_region: str | None = None, azure_speech_key: str | None = None, recognition_language: str = 'en-US') None[source]#

Initializes the converter with Azure Speech service credentials and recognition language.

Parameters:

Methods

__init__([azure_speech_region, ...])

Initializes the converter with Azure Speech service credentials and recognition language.

convert_async(*, prompt[, input_type])

Converts the given audio file into its text representation.

convert_tokens_async(*, prompt[, ...])

Converts substrings within a prompt that are enclosed by specified start and end tokens.

get_identifier()

Returns an identifier dictionary for the converter.

input_supported(input_type)

Checks if the input type is supported by the converter.

output_supported(output_type)

Checks if the output type is supported by the converter.

recognize_audio(audio_bytes)

Recognizes audio file and returns transcribed text.

stop_cb(evt, recognizer)

Callback function that stops continuous recognition upon receiving an event 'evt'.

transcript_cb(evt, transcript)

Callback function that appends transcribed text upon receiving a "recognized" event.

Attributes

AZURE_SPEECH_KEY_ENVIRONMENT_VARIABLE

The API key for accessing the service.

AZURE_SPEECH_REGION_ENVIRONMENT_VARIABLE

The name of the Azure region.

supported_input_types

Returns a list of supported input types for the converter.

supported_output_types

Returns a list of supported output types for the converter.

AZURE_SPEECH_KEY_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_KEY'#

The API key for accessing the service.

AZURE_SPEECH_REGION_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_REGION'#

The name of the Azure region.

async convert_async(*, prompt: str, input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error'] = 'audio_path') ConverterResult[source]#

Converts the given audio file into its text representation.

Parameters:
  • prompt (str) – File path to the audio file to be transcribed.

  • input_type (PromptDataType) – The type of input data.

Returns:

The result containing the transcribed text.

Return type:

ConverterResult

Raises:

ValueError – If the input type is not supported or if the provided file is not a .wav file.

input_supported(input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error']) bool[source]#

Checks if the input type is supported by the converter.

Parameters:

input_type (PromptDataType) – The input type to check.

Returns:

True if the input type is supported, False otherwise.

Return type:

bool

output_supported(output_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error']) bool[source]#

Checks if the output type is supported by the converter.

Parameters:

output_type (PromptDataType) – The output type to check.

Returns:

True if the output type is supported, False otherwise.

Return type:

bool

recognize_audio(audio_bytes: bytes) str[source]#

Recognizes audio file and returns transcribed text.

Parameters:

audio_bytes (bytes) – Audio bytes input.

Returns:

Transcribed text.

Return type:

str

stop_cb(evt: Any, recognizer: Any) None[source]#

Callback function that stops continuous recognition upon receiving an event ‘evt’.

Parameters:
  • evt (speechsdk.SpeechRecognitionEventArgs) – Event.

  • recognizer (speechsdk.SpeechRecognizer) – Speech recognizer object.

transcript_cb(evt: Any, transcript: list[str]) None[source]#

Callback function that appends transcribed text upon receiving a “recognized” event.

Parameters:
  • evt (speechsdk.SpeechRecognitionEventArgs) – Event.

  • transcript (list) – List to store transcribed text.