Lab 3.3.2 groundness evaluation using Prompt Flow (Code)

LLMOps Evaluating and monitoring of generative AI applications

Prerequisites

  • An Azure subscription where you can create an AI Hub and AI project Resource
  • Deployed gpt-4o model in Azure AI Foundry

Task

  • I want to quantitatively verify how well the model is answering questions
  • I want to benchmark in bulk data before production to find bottlenecks and improve

TOC

1️⃣ Execute batch run to get the base run data 
2️⃣ Evaluate the "groundedness" by your evalution flow

workthough Jupyter Notebook

  • Let’s create and run the groundness evaluation flow on the jupyter notebook using promptflow python sdk. You will learn how to evaluate the flow on Azure promptflow_with_evaluation_code.ipynb

Distributed by an MIT license. This hands-on lab was developed by Microsoft AI GBB (Global Black Belt).