Skip to Content
🚀 Voxtra v0.3.1 is live. Read the docs
VoxtraConceptsArchitecture

Architecture

Voxtra is built on four primitives. Once you understand how they compose, the rest of the API is mechanical.

The big picture

The four primitives

1. ARIClient

The lowest layer. Wraps Asterisk’s REST Interface (HTTP) and Stasis event stream (WebSocket). Methods are 1:1 with ARI endpoints — originate, answer_channel, hangup_channel, create_bridge, record_channel, reload_module, etc.

You rarely use this directly, but it’s there when you need to hit ARI features Voxtra hasn’t wrapped yet:

from voxtra import ARIClient ari = ARIClient(base_url="http://pbx:8088", username="...", password="...") await ari.connect() modules = await ari.list_modules() # raw ARI call await ari.reload_module("res_pjsip.so")

Reference →

2. AudioSocketServer

A TCP server that accepts AudioSocket connections from the Asterisk AudioSocket() dialplan app. Each connection is a CallSession’s media channel — frames in, frames out — without RTP or SRTP.

The server runs on a port you choose (default: ephemeral) and is auto- started by VoxtraApp when audio operations are first used.

3. CallSession

The developer-facing handle. Every inbound call and every successful originate() produces one. It’s what your @app.route() handler receives:

@app.default() async def handle(call): # ← call: CallSession await call.answer() digit = await call.listen_dtmf(timeout=5) if digit == "1": await call.transfer_to_queue("support") else: await call.bridge_with(other_session)

CallSession exposes:

  • Lifecycle: answer, hangup, hold, unhold, transfer_to.
  • Audio: audio_stream(), send_audio(), play_file().
  • DTMF: listen_dtmf(), send_dtmf().
  • Recording: record_start(), record_stop().
  • AI shortcuts: say(text), listen(timeout=), agent.respond(text).
  • Bridging: bridge_with(other), transfer_to_queue(name).

Reference →

4. VoxtraApp

The orchestrator. It:

  1. Owns the ARIClient connection and event loop.
  2. Translates ARI events (StasisStart, StasisEnd, ChannelDtmfReceived, …) into VoxtraEvents.
  3. Looks up the right handler in the Router.
  4. Creates a CallSession and runs your handler in a background task.
  5. Optionally auto-wires a VoicePipeline (when STT + LLM + TTS are configured) and a BackendWebhook.
  6. Cleans up on hangup.

You construct it once, decorate handlers, and call app.run():

app = VoxtraApp(ari_url="...", ari_user="...", ari_password="...") @app.default() async def handle(call): ... app.run()

Reference →

The provider registry

STT, TTS, LLM, VAD, telephony, and media providers self-register via decorators:

from voxtra.registry import registry from voxtra.ai.stt.base import BaseSTT @registry.register_stt("my-provider") class MySTT(BaseSTT): ...

This means third-party packages can ship new providers without touching Voxtra’s core. Resolution is lazy — providers aren’t imported until something asks for them by name. See voxtra.registry.

Telephony adapter contract

BaseTelephonyAdapter is the seam between Voxtra and the underlying PBX. Voxtra ships an AsteriskAdapter (via ARIClient) and a LiveKitAdapter stub. New adapters implement ten async methods (connect, listen, answer_call, hangup_call, transfer_call, hold_call, send_dtmf, create_media_bridge, play_audio, disconnect) and translate the backend’s native events into VoxtraEvents.

Switch backends in one line:

from voxtra import VoxtraApp from voxtra.telephony.asterisk import AsteriskAdapter app = VoxtraApp(telephony=AsteriskAdapter(...)) # or: app = VoxtraApp.with_asterisk(ari_url=..., ari_user=..., ari_password=...)

Sessions, events, and the queue

Every CallSession has an asyncio.Queue for VoxtraEvents. The framework pushes events onto it; your handler can await them via helpers like listen(), listen_dtmf(), or audio_stream(). Events that flow through:

  • CALL_STARTED, CALL_ANSWERED, CALL_ENDED
  • USER_TRANSCRIPT, AGENT_RESPONSE (from the AI pipeline)
  • DTMF_RECEIVED
  • MEDIA_STARTED, MEDIA_STOPPED

Full taxonomy: Events →.

What runs where

ComponentProcessNotes
VoxtraAppYour applicationOne per Stasis app namespace.
ARIClientInside VoxtraAppSingle HTTP + WS connection, auto-reconnect.
AudioSocketServerInside VoxtraAppTCP server, accepts media legs.
VoicePipelineOne per active CallSessionBackground asyncio task.
BackendWebhookOne per VoxtraAppOwns its own httpx.AsyncClient.
AsteriskSeparate processVoxtra never assumes single-host deployment.

Production-grade defaults

Voxtra’s defaults are tuned for production traffic:

  • Reconnects on ARI WS drops with configurable backoff.
  • Idempotent stop signals — agents/stop and delete_room no-op cleanly on already-gone resources.
  • SIP-aware idle detection — browser observers leaving doesn’t end a call; only a missing SIP leg does.
  • HMAC-signed webhooks — receivers verify origin before acting.
  • Best-effort emission — webhook and recording-sink failures never propagate into the call pipeline.
Last updated on