Benchmark#
The Benchmark component is used to evaluate model performance and safety. For example, the Q&A Benchmark strategy evaluates a target model’s ability to answer questions from a provided dataset. This can give more insight into model ability over different criteria areas.
All benchmarks have configurable converter and scoring configurations and custom contexts, and produce a result that can be further analyzed.