Voxli Voxli

Running Tests

This guide shows you how to run a complete test suite against your AI agent using Voxli’s REST API.

Complete Example

Here’s a full Python script that executes all tests in a scenario:

"""
Run Voxli tests against your chatbot or AI agent.
1. Create a test run
2. Get all tests for the scenario
3. Simulate each test
"""
import os
import time
import requests
def poll_next_message(endpoint: str, headers: dict, timeout: int = 30) -> str | None:
start_time = time.time()
while True:
response = requests.post(endpoint, headers=headers)
response.raise_for_status()
data = response.json()
if data["ready"]:
return None if data.get("end_chat") else data["message"]
if time.time() - start_time > timeout:
raise TimeoutError("Timed out waiting for message")
time.sleep(1)
# --- Configuration ---
api_key = os.getenv("VOXLI_API_KEY")
base_url = os.getenv("VOXLI_API_URL", "https://api.voxli.io")
scenario_id = os.getenv("VOXLI_SCENARIO_ID")
agent_id = os.getenv("VOXLI_AGENT_ID")
headers = {"Authorization": f"Bearer {api_key}"}
# 1. Create a test run
run = requests.post(f"{base_url}/runs/", headers=headers, json={
"scenario": scenario_id,
"agent": agent_id,
"status": "running"
}).json()
run_id = run["id"]
# 2. Get all tests for this scenario
tests = requests.get(f"{base_url}/scenarios/{scenario_id}/tests", headers=headers).json()["data"]
# 3. Simulate each test
for test in tests:
# 3a. Create a test result entry
result = requests.post(f"{base_url}/test-results/", headers=headers, json={
"test": test["id"],
"run": run_id,
"agent": agent_id
}).json()
result_id = result["id"]
generate_endpoint = f"{base_url}/test-results/{result_id}/next-message"
conversation_endpoint = f"{base_url}/test-results/{result_id}/conversation"
# 3b. Get first message from Voxli
tester_message = poll_next_message(generate_endpoint, headers)
# 3c. Conversation loop
while tester_message is not None:
# TODO: Replace with your agent's response
start = time.monotonic()
agent_response = "The meaning of life is 42."
response_time_ms = round((time.monotonic() - start) * 1000)
# 3d. Record agent response (include metadata for performance tracking)
requests.post(
conversation_endpoint,
headers=headers, json={
"type": "message",
"content": agent_response,
"metadata": {
"responseTime": response_time_ms,
"inputTokens": input_tokens,
"outputTokens": output_tokens,
"cost": cost,
}
}
)
# 3e. Get next message from Voxli
tester_message = poll_next_message(generate_endpoint, headers)
print(f"Test run {run_id} completed.")

How It Works

1. Create a Test Run: Initialize a new test run for your scenario with status: "running".

2. Fetch Tests: Retrieve all tests associated with the scenario.

3. Execute Each Test:

  • Create a result entry by posting to /test-results/ with the test, run, and agent IDs
  • Start the conversation by calling next-message to get the first tester message
  • Enter a conversation loop where you relay messages between Voxli and your agent
  • Continue until Voxli signals end_chat: true

The run is automatically marked as completed once all tests finish.

Message Metadata

When posting agent messages to the conversation endpoint, you can include a metadata object with performance metrics. Voxli reads the following recognized keys:

KeyTypeDescription
responseTimenumberTime in milliseconds for the agent to respond
inputTokensnumberInput/prompt token count for the LLM call
outputTokensnumberOutput/completion token count for the LLM call
costnumberCost of the LLM call in USD
import time
start = time.monotonic()
agent_response = get_agent_response(tester_message) # your agent logic
response_time_ms = round((time.monotonic() - start) * 1000)
requests.post(conversation_endpoint, headers=headers, json={
"type": "message",
"content": agent_response,
"metadata": {
"responseTime": response_time_ms,
"inputTokens": input_tokens,
"outputTokens": output_tokens,
"cost": cost,
}
})

When available, these metrics are displayed as averages in test result details and comparison views. They help identify performance regressions and cost differences across agent configurations.

Test your Agent

Tools and Events