pyrit.prompt_converter.AzureSpeechAudioToTextConverter#

class AzureSpeechAudioToTextConverter(azure_speech_region: str = None, azure_speech_key: str = None, recognition_language: str = 'en-US')[source]#

Bases: PromptConverter

The AzureSpeechAudioTextConverter takes a .wav file and transcribes it into text. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text

Parameters:
__init__(azure_speech_region: str = None, azure_speech_key: str = None, recognition_language: str = 'en-US') None[source]#

Methods

__init__([azure_speech_region, ...])

convert_async(*, prompt[, input_type])

Converter that transcribes audio to text.

convert_tokens_async(*, prompt[, ...])

Converts substrings within a prompt that are enclosed by specified start and end tokens.

get_identifier()

input_supported(input_type)

Checks if the input type is supported by the converter

recognize_audio(audio_bytes)

Recognize audio file and return transcribed text.

stop_cb(evt, recognizer)

Callback function that stops continuous recognition upon receiving an event 'evt'

transcript_cb(evt, transcript)

Callback function that appends transcribed text upon receiving a "recognized" event

Attributes

AZURE_SPEECH_KEY_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_KEY'#
AZURE_SPEECH_REGION_ENVIRONMENT_VARIABLE: str = 'AZURE_SPEECH_REGION'#
async convert_async(*, prompt: str, input_type: Literal['text', 'image_path', 'audio_path', 'url', 'error'] = 'audio_path') ConverterResult[source]#

Converter that transcribes audio to text.

Parameters:
  • prompt (str) – File path to audio file

  • input_type (PromptDataType) – Type of data

Returns:

The transcribed text as a ConverterResult Object

Return type:

ConverterResult

input_supported(input_type: Literal['text', 'image_path', 'audio_path', 'url', 'error']) bool[source]#

Checks if the input type is supported by the converter

Parameters:

input_type – The input type to check

Returns:

True if the input type is supported, False otherwise

Return type:

bool

recognize_audio(audio_bytes: bytes) str[source]#

Recognize audio file and return transcribed text.

Parameters:

audio_bytes (bytes) – Audio bytes input.

Returns:

Transcribed text

Return type:

str

stop_cb(evt: SpeechRecognitionEventArgs, recognizer: SpeechRecognizer) None[source]#

Callback function that stops continuous recognition upon receiving an event ‘evt’

Parameters:
  • evt (SpeechRecognitionEventArgs) – event

  • recognizer (SpeechRecognizer) – speech recognizer object

transcript_cb(evt: SpeechRecognitionEventArgs, transcript: list[str]) None[source]#

Callback function that appends transcribed text upon receiving a “recognized” event

Parameters:
  • evt (SpeechRecognitionEventArgs) – event

  • transcript (list) – list to store transcribed text