Running Tests

This guide shows you how to run a complete test suite against your AI agent using Voxli’s REST API.

Complete Example

Here’s a full Python script that executes all tests in a scenario:

"""
Run Voxli tests against your chatbot or AI agent.

1. Create a test run
2. Get all tests for the scenario
3. Simulate each test
"""

import os
import time
import requests


def poll_next_message(endpoint: str, headers: dict, timeout: int = 30) -> dict | None:
    """Poll next-message until the tester is ready or the chat ends.

    Returns None when the chat is over, otherwise a dict with either a
    `message` (free text) or an `action` (an ActionInvocation) field.
    """
    start_time = time.time()
    while True:
        response = requests.post(endpoint, headers=headers)
        response.raise_for_status()
        data = response.json()
        if data["ready"]:
            if data.get("end_chat"):
                return None
            return {"message": data.get("message"), "action": data.get("action")}
        if time.time() - start_time > timeout:
            raise TimeoutError("Timed out waiting for message")
        time.sleep(1)


# --- Configuration ---
api_key = os.getenv("VOXLI_API_KEY")
base_url = os.getenv("VOXLI_API_URL", "https://api.voxli.io")
scenario_id = os.getenv("VOXLI_SCENARIO_ID")
agent_id = os.getenv("VOXLI_AGENT_ID")

headers = {"Authorization": f"Bearer {api_key}"}

# 1. Create a test run
run = requests.post(f"{base_url}/runs/", headers=headers, json={
    "scenario": scenario_id,
    "agent": agent_id,
    "status": "running"
}).json()
run_id = run["id"]

# 2. Get all tests for this scenario
tests = requests.get(f"{base_url}/scenarios/{scenario_id}/tests", headers=headers).json()["data"]

# 3. Simulate each test
for test in tests:
    # 3a. Create a test result entry
    result = requests.post(f"{base_url}/test-results/", headers=headers, json={
        "test": test["id"],
        "run": run_id,
        "agent": agent_id
    }).json()
    result_id = result["id"]
    generate_endpoint = f"{base_url}/test-results/{result_id}/next-message"
    conversation_endpoint = f"{base_url}/test-results/{result_id}/conversation"

    # 3b. Get first turn from Voxli
    turn = poll_next_message(generate_endpoint, headers)

    # 3c. Conversation loop
    while turn is not None:
        # TODO: Replace with your agent's response
        start = time.monotonic()
        if turn.get("action"):
            # Tester invoked a registered action instead of typing. Apply
            # it in your system, then record the chatbot's follow-up.
            agent_response = your_agent.apply_action(
                turn["action"]["name"],
                turn["action"].get("arguments", {}),
            )
        else:
            agent_response = your_agent.process(turn["message"])
        response_time_ms = round((time.monotonic() - start) * 1000)

        # 3d. Record agent response (include metadata for performance tracking)
        requests.post(
            conversation_endpoint,
            headers=headers, json={
                "type": "message",
                "content": agent_response,
                "metadata": {
                    "responseTime": response_time_ms,
                    "inputTokens": input_tokens,
                    "outputTokens": output_tokens,
                    "cost": cost,
                }
            }
        )

        # 3e. Get next turn from Voxli
        turn = poll_next_message(generate_endpoint, headers)

print(f"Test run {run_id} completed.")

How It Works

1. Create a Test Run: Initialize a new test run for your scenario with status: "running".

2. Fetch Tests: Retrieve all tests associated with the scenario.

3. Execute Each Test:

Create a result entry by posting to /test-results/ with the test, run, and agent IDs
Start the conversation by calling next-message to get the first tester turn
Enter a conversation loop where you relay each turn between Voxli and your agent
Continue until Voxli signals end_chat: true

Each next-message response may return ready: false if the next turn is not yet available. Poll the endpoint with a short delay until it returns ready: true.

Each ready turn contains either message (free text from the tester) or action (an invocation of one of the actions your chatbot registered for this turn). Branch on which one is populated - they are mutually exclusive. See Tools, Events, and Actions for how to register actions.

The run is automatically marked as completed once all tests finish.

Message Metadata

When posting agent messages to the conversation endpoint, you can include a metadata object with performance metrics. Voxli reads the following recognized keys:

Key	Type	Description
`responseTime`	number	Time in milliseconds for the agent to respond
`inputTokens`	number	Input/prompt token count for the LLM call
`outputTokens`	number	Output/completion token count for the LLM call
`cost`	number	Cost of the LLM call in USD

import time

start = time.monotonic()
agent_response = get_agent_response(tester_message)  # your agent logic
response_time_ms = round((time.monotonic() - start) * 1000)

requests.post(conversation_endpoint, headers=headers, json={
    "type": "message",
    "content": agent_response,
    "metadata": {
        "responseTime": response_time_ms,
        "inputTokens": input_tokens,
        "outputTokens": output_tokens,
        "cost": cost,
    }
})

When available, these metrics are displayed as averages in test result details and comparison views. They help identify performance regressions and cost differences across agent configurations.