Voxli Voxli

Assertions

Understand how assertions work, how severity levels affect scoring, and how Voxli calculates test scores.

Assertions and Scoring

What assertions do

Assertions are pass/fail checks that run after a test conversation ends. Each assertion has two parts:

  • Criteria - a plain-language description of what to check. For example: “The agent confirms the order status is shipped.”
  • Severity - how important this check is. Blocker, medium, or low.

After the conversation finishes, an AI evaluator reads the full transcript and decides whether each assertion passed or failed. You can add up to 10 assertions per test.

To learn how to add assertions when creating a test, see Create a Scenario.

Severity levels

Each assertion has a severity level that controls how much it affects the test score.

Blocker - Critical requirements. If any blocker assertion fails, the entire test is considered failed regardless of the overall score. Use blockers for safety rules, core functionality, and must-do actions.

Medium - Important checks that aren’t dealbreakers. Use medium for expected behaviors and standard responses.

Low - Nice-to-have verifications. Use low for tone, formatting, and optional details.

A test with four assertions showing blocker, medium, and low severity levels

How scoring works

Each severity level has a weight:

SeverityWeight
Blocker4
Medium2
Low1

The test score is a weighted percentage:

Score = (sum of passed assertion weights / sum of all assertion weights) x 100

Worked example

A test has 4 assertions: 2 blockers, 1 medium, and 1 low.

  • Total weight = 4 + 4 + 2 + 1 = 11
  • If 1 blocker fails, the passed weight = 4 + 2 + 1 = 7
  • Score = 7 / 11 x 100 = 63%
A test result showing a 63% score with one blocker failed

Even though the score is 63%, this test is marked as failed because a blocker assertion didn’t pass. A test only passes when all blocker assertions pass - regardless of the overall score.

Tool call assertions

Your assertions can check whether the agent called specific tools during the conversation. For example:

The agent calls the check_order tool with the correct order ID.

Tool calls and their return values are visible in the conversation transcript, so the evaluator can verify both that the right tool was called and that the right data was passed. See Results for how to inspect tool calls in the results view.

Writing Tests

Preview Tests