Running Tests
This guide shows you how to run a complete test suite against your AI agent using Voxli’s REST API.
Complete Example
Here’s a full Python script that executes all tests in a scenario:
"""Run Voxli tests against your chatbot or AI agent.1. Create a test run2. Get all tests for the scenario3. Simulate each test"""import osimport timeimport requestsdef poll_next_message(endpoint: str, headers: dict, timeout: int = 30) -> str | None:start_time = time.time()while True:response = requests.post(endpoint, headers=headers)response.raise_for_status()data = response.json()if data["ready"]:return None if data.get("end_chat") else data["message"]if time.time() - start_time > timeout:raise TimeoutError("Timed out waiting for message")time.sleep(1)# --- Configuration ---api_key = os.getenv("VOXLI_API_KEY")base_url = os.getenv("VOXLI_API_URL", "https://api.voxli.io")scenario_id = os.getenv("VOXLI_SCENARIO_ID")agent_id = os.getenv("VOXLI_AGENT_ID")headers = {"Authorization": f"Bearer {api_key}"}# 1. Create a test runrun = requests.post(f"{base_url}/runs/", headers=headers, json={"scenario": scenario_id,"agent": agent_id,"status": "running"}).json()run_id = run["id"]# 2. Get all tests for this scenariotests = requests.get(f"{base_url}/scenarios/{scenario_id}/tests", headers=headers).json()["data"]# 3. Simulate each testfor test in tests:# 3a. Create a test result entryresult = requests.post(f"{base_url}/test-results/", headers=headers, json={"test": test["id"],"run": run_id,"agent": agent_id}).json()result_id = result["id"]generate_endpoint = f"{base_url}/test-results/{result_id}/next-message"conversation_endpoint = f"{base_url}/test-results/{result_id}/conversation"# 3b. Get first message from Voxlitester_message = poll_next_message(generate_endpoint, headers)# 3c. Conversation loopwhile tester_message is not None:# TODO: Replace with your agent's responsestart = time.monotonic()agent_response = "The meaning of life is 42."response_time_ms = round((time.monotonic() - start) * 1000)# 3d. Record agent response (include metadata for performance tracking)requests.post(conversation_endpoint,headers=headers, json={"type": "message","content": agent_response,"metadata": {"responseTime": response_time_ms,"inputTokens": input_tokens,"outputTokens": output_tokens,"cost": cost,}})# 3e. Get next message from Voxlitester_message = poll_next_message(generate_endpoint, headers)print(f"Test run {run_id} completed.")
How It Works
1. Create a Test Run: Initialize a new test run for your scenario with status: "running".
2. Fetch Tests: Retrieve all tests associated with the scenario.
3. Execute Each Test:
- Create a result entry by posting to
/test-results/with the test, run, and agent IDs - Start the conversation by calling
next-messageto get the first tester message - Enter a conversation loop where you relay messages between Voxli and your agent
- Continue until Voxli signals
end_chat: true
The run is automatically marked as completed once all tests finish.
Message Metadata
When posting agent messages to the conversation endpoint, you can include a metadata object with performance metrics. Voxli reads the following recognized keys:
| Key | Type | Description |
|---|---|---|
responseTime | number | Time in milliseconds for the agent to respond |
inputTokens | number | Input/prompt token count for the LLM call |
outputTokens | number | Output/completion token count for the LLM call |
cost | number | Cost of the LLM call in USD |
import timestart = time.monotonic()agent_response = get_agent_response(tester_message) # your agent logicresponse_time_ms = round((time.monotonic() - start) * 1000)requests.post(conversation_endpoint, headers=headers, json={"type": "message","content": agent_response,"metadata": {"responseTime": response_time_ms,"inputTokens": input_tokens,"outputTokens": output_tokens,"cost": cost,}})
When available, these metrics are displayed as averages in test result details and comparison views. They help identify performance regressions and cost differences across agent configurations.