Voxli Voxli

Running Tests

This guide shows you how to run a complete test suite against your AI agent using Voxli’s REST API.

Complete Example

Here’s a full Python script that executes all tests in a scenario:

"""
Run Voxli tests against your chatbot or AI agent.
1. Create a test run
2. Get all tests for the scenario
3. Simulate each test
"""
import os
import time
import requests
def poll_next_message(endpoint: str, headers: dict, timeout: int = 30) -> dict | None:
"""Poll next-message until the tester is ready or the chat ends.
Returns None when the chat is over, otherwise a dict with either a
`message` (free text) or an `action` (an ActionInvocation) field.
"""
start_time = time.time()
while True:
response = requests.post(endpoint, headers=headers)
response.raise_for_status()
data = response.json()
if data["ready"]:
if data.get("end_chat"):
return None
return {"message": data.get("message"), "action": data.get("action")}
if time.time() - start_time > timeout:
raise TimeoutError("Timed out waiting for message")
time.sleep(1)
# --- Configuration ---
api_key = os.getenv("VOXLI_API_KEY")
base_url = os.getenv("VOXLI_API_URL", "https://api.voxli.io")
scenario_id = os.getenv("VOXLI_SCENARIO_ID")
agent_id = os.getenv("VOXLI_AGENT_ID")
headers = {"Authorization": f"Bearer {api_key}"}
# 1. Create a test run
run = requests.post(f"{base_url}/runs/", headers=headers, json={
"scenario": scenario_id,
"agent": agent_id,
"status": "running"
}).json()
run_id = run["id"]
# 2. Get all tests for this scenario
tests = requests.get(f"{base_url}/scenarios/{scenario_id}/tests", headers=headers).json()["data"]
# 3. Simulate each test
for test in tests:
# 3a. Create a test result entry
result = requests.post(f"{base_url}/test-results/", headers=headers, json={
"test": test["id"],
"run": run_id,
"agent": agent_id
}).json()
result_id = result["id"]
generate_endpoint = f"{base_url}/test-results/{result_id}/next-message"
conversation_endpoint = f"{base_url}/test-results/{result_id}/conversation"
# 3b. Get first turn from Voxli
turn = poll_next_message(generate_endpoint, headers)
# 3c. Conversation loop
while turn is not None:
# TODO: Replace with your agent's response
start = time.monotonic()
if turn.get("action"):
# Tester invoked a registered action instead of typing. Apply
# it in your system, then record the chatbot's follow-up.
agent_response = your_agent.apply_action(
turn["action"]["name"],
turn["action"].get("arguments", {}),
)
else:
agent_response = your_agent.process(turn["message"])
response_time_ms = round((time.monotonic() - start) * 1000)
# 3d. Record agent response (include metadata for performance tracking)
requests.post(
conversation_endpoint,
headers=headers, json={
"type": "message",
"content": agent_response,
"metadata": {
"responseTime": response_time_ms,
"inputTokens": input_tokens,
"outputTokens": output_tokens,
"cost": cost,
}
}
)
# 3e. Get next turn from Voxli
turn = poll_next_message(generate_endpoint, headers)
print(f"Test run {run_id} completed.")

How It Works

1. Create a Test Run: Initialize a new test run for your scenario with status: "running".

2. Fetch Tests: Retrieve all tests associated with the scenario.

3. Execute Each Test:

  • Create a result entry by posting to /test-results/ with the test, run, and agent IDs
  • Start the conversation by calling next-message to get the first tester turn
  • Enter a conversation loop where you relay each turn between Voxli and your agent
  • Continue until Voxli signals end_chat: true

Each next-message response may return ready: false if the next turn is not yet available. Poll the endpoint with a short delay until it returns ready: true.

Each ready turn contains either message (free text from the tester) or action (an invocation of one of the actions your chatbot registered for this turn). Branch on which one is populated - they are mutually exclusive. See Tools, Events, and Actions for how to register actions.

The run is automatically marked as completed once all tests finish.

Message Metadata

When posting agent messages to the conversation endpoint, you can include a metadata object with performance metrics. Voxli reads the following recognized keys:

KeyTypeDescription
responseTimenumberTime in milliseconds for the agent to respond
inputTokensnumberInput/prompt token count for the LLM call
outputTokensnumberOutput/completion token count for the LLM call
costnumberCost of the LLM call in USD
import time
start = time.monotonic()
agent_response = get_agent_response(tester_message) # your agent logic
response_time_ms = round((time.monotonic() - start) * 1000)
requests.post(conversation_endpoint, headers=headers, json={
"type": "message",
"content": agent_response,
"metadata": {
"responseTime": response_time_ms,
"inputTokens": input_tokens,
"outputTokens": output_tokens,
"cost": cost,
}
})

When available, these metrics are displayed as averages in test result details and comparison views. They help identify performance regressions and cost differences across agent configurations.

Test your Agent

Tools and Events