Sharpens 24 companies · First observed March 2023 · Updated July 2026 Explore in the graph

Voice APIs Converge on Per-Minute Billing

Quick answer

Seventeen corpus companies bill on media-minutes — per-minute or per-hour — as a primary unit, and among dedicated voice and audio AI vendors the per-minute meter is near-universal. The shift from per-character (batch TTS era) to per-minute (agent call era) reflects the dominant use case moving from content generation to real-time conversational AI.

17 vendors bill on media-minutes (per-minute / per-hour)

What's happening — and why

What's happening: the standard billing unit for voice AI has converged on media-minutes — per-minute or per-hour — rather than per-character or per-request. Seventeen corpus companies use media-minutes as a primary unit, and per-minute is the default across dedicated voice and audio APIs.

Why: the dominant voice AI use case shifted. In the batch TTS era, producers turned text into audio files; the natural unit was the characters being spoken. In the agent call era, voice is deployed in real-time phone calls, voicebots, and conversational interfaces — where wall-clock time, not text volume, is the cost driver. Telephony and call-center buyers already think in minutes; per-minute aligns vendor pricing with buyer mental models.

ElevenLabs is the clearest example: it charges per-minute for Conversational AI (agent calls) and per-character for Studio TTS (batch). The two units coexist for the two use cases.

How it works

Voice pricing shifted from per-character (batch TTS) to per-minute (agent calls) as real-time use cases dominated.

Evidence over time

30 supporting · 1 counter — hover or tap a point for detail, click to jump to the row.

supporting evidence counterexample

Evidence

Company	Date	What happened
assemblyai	Jul 2026	Launched a standalone Voice Agent API at $1.50/hr ($0.075/min) — a per-minute/per-hour rate for a proprietary end-to-end voice stack built on its Realtime STT. A brand-new voice product priced per-minute from day one.
speechmatics	Jul 2026	Cut real-time enhanced STT (the live-voice-agent SKU) $0.56→$0.43/hr and added a multilingual Batch Melia 1 model at $0.129/hr — all on the per-hour audio meter; per-minute/per-hour remains the only billing unit across the Pro tier.
bland-ai	Jun 2024	Billed entirely on media-minutes; phone-agent model fits per-minute naturally
elevenlabs	May 2026	Conversational AI (agent calls) price cut to per-minute rate; retains per-character for Studio TTS. Both units coexist in billing.
cartesia	Feb 2026	Voice Agents GA at flat per-minute rate; prior API used credits/requests
deepgram	Jan 2025	Transcription and TTS both per-minute; Nova-2 ASR $0.0043/min, Aura TTS $0.0150/min
tavus	Jan 2025	Entire model is hybrid access fee + pay-as-you-go video minutes; per-minute is the only consumption unit
speechmatics	Jun 2025	Per-hour STT, per-character TTS — both units present; moving toward per-minute for real-time
murf-ai	Jun 2026	Murf API launched with per-character and per-minute lanes; Studio plans cap on minutes
rev-ai	Jan 2025	Pure usage per-minute; transcription billed in 15-second increments
krisp	Jun 2025	Call Center product bills on accent-minutes; per-agent seats plus minute consumption
synthesia	May 2026	Video-minute credits drive all plan tiers; minutes are the primary consumption signal
twelve-labs	Jun 2025	Video understanding billed per video-minute indexed; minutes is the primary query unit
wellsaid	Jun 2026	Annual download quotas expressed as minutes per plan tier; per-seat+minutes model
hedra	Dec 2025	Credits map to video/audio seconds; effectively per-minute billing abstracted through credits
fal-ai	Jun 2025	Audio/video models billed per second of output; effectively per-minute at scale
retell-ai	Jun 2026	Voice-agent minute itemized into Voice Infra ($0.055/min) + LLM + TTS + telephony; calls metered to the nearest second
vapi	Jun 2026	$0.05/min hosting fee with model and telephony passed through at cost — the minute is the platform's only unit
synthflow	Jun 2025	No-platform-fee PAYG billing the Voice Engine, LLM and telephony as three separate per-minute meters
livekit	Jun 2026	Realtime agent transport billed in participant-minutes plus bandwidth — the WebRTC layer under voice agents meters minutes too
daily	Jan 2026	Pipecat Cloud and transport billed per participant-minute with published automatic volume discounts ($0.004 falling ~63% to $0.0015 past 50M min/mo)
hume-ai	Jun 2026	EVI voice agent per-minute, falling by generation: ~$0.102/min (EVI 1) → ~$0.072 (EVI 2) → $0.04/min Business-tier overage
vapi	Jun 2026	Build tier: $0.05/min Vapi hosting plus at-cost passthrough of STT/LLM/TTS/telephony — pure per-minute developer API with a Scale annual-commit tier on top.
retell-ai	Feb 2024	Launched with true pay-as-you-go per-minute voice agents; no platform fee, no contract — the clearest developer-first per-minute billing in the voice-agent segment.
synthflow	Jun 2024	Subscription tiers bundle minutes (Starter 50 min, Pro 2,000 min) with per-minute overage ($0.12–$0.13/min) — subscription wrapping a per-minute consumption unit.
polyai	Jun 2026	Per-minute enterprise billing (sales-gated); third-party reports ~$150K annual minimum; media-minutes is the stated unit even at fully gated enterprise tier.
livekit	Jun 2026	Multi-dimension metering bundles agent-session minutes ($0.01/min overage) and WebRTC media minutes ($0.0004–$0.0005/min) as two separate per-minute meters — one for AI agent orchestration, one for real-time media.
hume-ai	Mar 2024	EVI 1 launched at $0.102/min; EVI 2 cut to ~$0.072/min (Sept 2024); EVI 3 overage $0.07/min, EVI 4 MINI $0.04/min — a per-minute voice model that has deflated ~60% in two years.
gladia	Jun 2026	STT/transcription billed per audio-minute; free 10 hours/month then pay-as-you-go — per-minute is the primary unit for the ASR segment.
playht	Mar 2023	Early plans were per-character; API pivoted toward per-request and agent voice as the voice-agent market matured, illustrating the per-character → per-minute migration path.

Counterexamples

lmnt · — — Charges per character for TTS only — no per-minute lane. Serves batch text-to-speech, not agent calls.
wellsaid · — — Per-seat + annual quota model dilutes the pure per-minute signal; enterprise customers are quota-capped, not metered.
descript · Jun 2025 — Media hours billed at tier level, not granularly per-minute; subscription model with hour pools

Trivia

Wave 27 (June 2026) grew the media-minutes cohort from 15 to 26 corpus companies in one intake — and 23 of the 26 (88%) publish per-minute rates self-serve, making voice one of the most price-transparent AI categories despite its enterprise contact-center wing being fully gated.
Daily (verified 2026-06-09) publishes the steepest automatic volume curve in the voice cohort: its participant-minute rate falls ~63% — from $0.004 to $0.0015 — once a customer crosses 50M minutes/month, with no negotiation, after the company dropped subscription plans entirely in June 2022.
Hume's EVI shows the per-minute price deflating by model generation like tokens do: ~$0.102/min (EVI 1, 2024) → ~$0.072/min (EVI 2, ~30% cut) → $0.04/min Business-tier overage by 2026 — a 61% decline across three generations of the same voice-agent product.
AssemblyAI's 2026-07-06 Voice Agent launched at exactly $0.075/min ($1.50/hr) — the same per-minute rate Vapi charges for its hosting fee — showing the ~$0.05–$0.08/min band has become the reference price for an end-to-end voice-agent minute even as the underlying STT beneath it kept falling (Speechmatics cut real-time enhanced -23% the very same day). A new voice product now launches straight onto the per-minute meter rather than testing per-character or per-request.

See all pricing trivia

For buyers

Budget voice workloads in minutes, not characters. For batch content generation, characters may still be the efficient unit (WellSaid, LMNT). For agent calls and real-time voice, per-minute is the standard — model your cost on expected call durations and call volumes, not script length.

For vendors

If you are building a voice AI product, per-minute pricing aligns with the call-center and telephony mental model your buyers already use. If you serve both batch TTS and agent use cases, maintain both units (ElevenLabs' model): per-character for Studio, per-minute for Conversational.

Outlook — what to watch

As agent voice becomes the dominant voice AI use case, per-minute will further displace per-character. The holdout (LMNT, characters-only) is a batch-focused product. Watch for per-second granularity appearing in cost-sensitive high-volume deployments.

Bottom line

Voice AI billing has converged on media-minutes. Seventeen corpus companies use it as a primary unit, driven by the shift from batch TTS to real-time agent calls.

FAQ

How do voice AI APIs charge for usage?

Almost universally per-minute or per-hour of audio. Seventeen corpus companies bill on media-minutes; among dedicated voice vendors per-minute is near-universal, with LMNT (per character, batch TTS) the main exception.

Why per-minute instead of per-character?

Real-time agent calls — the dominant voice use case — are bounded by wall-clock time, not text volume. Per-minute aligns with telephony buyer mental models and the actual cost driver.

Does ElevenLabs charge per minute or per character?

Both. Conversational AI (agent calls) is priced per minute; Studio TTS (batch text-to-speech) is priced per character. The two billing units coexist for the two use cases.

All trends