Media-Minute Pricing: Examples & Companies

What is it

Media-Minute Pricing is a billing unit where customers are charged per minute of audio or video processed — used by speech, voice, and video AI vendors.

A media minute is the duration unit of AI that hears, speaks, or generates moving pictures. Where text models meter tokens and infrastructure meters GPU-hours, speech and video products meter the length of the media — a minute of audio transcribed, a minute of speech synthesized, a minute of a WebRTC call carried, a minute of conversational video rendered. The reason is structural: audio and video have no natural token boundary the buyer can count in advance, but they always have a runtime. A 12-minute support call, a 30-second ad, a one-hour podcast — each carries an obvious, estimable duration that maps closely to the compute required to process it.

The unit is shared across products that look very different on the surface. Deepgram, Speechmatics, and Gladia bill speech-to-text by the minute or hour of input audio. Retell AI, Vapi, and Synthflow bill voice agents by the connected minute of a phone call. Real-time media platforms like Daily and LiveKit bill per participant-minute of a call carried. On the video side, Twelve Labs bills video understanding by the minute of source footage, while Tavus and Luma AI bill generated video by the minute or second of output. The same root word — “minute” — covers transcription, synthesis, agents, transport, and generation.

What makes the unit interesting is the spread. The cheapest media minute in this corpus, raw WebRTC media on LiveKit, costs about $0.0005; the most expensive, a full unbundled voice-agent minute on Retell AI, reaches $0.31 — over 600x higher for the same sixty seconds. Between them sit machine transcription (Gladia at $0.144/hr, roughly $0.0024/min), video indexing (Twelve Labs at $0.042/min), and healthcare-agent time (Hippocratic AI at ~$9/hour, or $0.15/min). The minute is one unit; the price is a function of what happens during that minute. See choosing the right usage metric for why duration is the natural fit here.

One 60-second minute — a >600× span set by the work inside it

How it works

The core formula is simple: media cost equals the per-minute rate for the chosen model, multiplied by the minutes of audio or video processed (usually metered to the second and rounded up, often with a short minimum). The complexity lives in the dimensions wrapped around that minute — which task, which model, real-time versus batch, accuracy tier, whether the meter is per participant or per stream, and whether the vendor exposes the minute directly or hides it behind credits.

Dimension	What it controls	Example from this corpus
Task type	Transcription, synthesis, agents, transport, or video each get their own meter	Retell AI: Voice Infra per min, chat agents per message, TTS per min
Model / accuracy tier	Faster or more accurate models cost more per minute	Twelve Labs: Marengo index $0.042/min vs Pegasus analysis $0.0292/min
Real-time vs batch	Streaming and live agents carry a premium over pre-recorded	Daily: video $0.004/participant-min live; Gladia $0.144/hr batch STT
Unbundled stack	Voice agents compose infra + LLM + TTS + telephony per minute	Retell AI: $0.055/min infra + LLM + TTS ($0.015–$0.04) + telephony ($0.015)
Minute vs credit packaging	Whether the buyer sees minutes or a converted credit	Opus Clip: ~1 credit ≈ 1 minute of processed video

The display unit is frequently a presentation choice rather than the meter. Gladia and Sarvam AI quote per hour of audio ($0.144/hr and ₹30/hr respectively) while Twelve Labs re-expresses its $0.042/min index rate as $2.50/hour — the same rate, two labels. Higher up the stack, Synthesia, Pika, Luma AI, and Opus Clip sell a credit pool that converts to minutes — the minute is the real value metric, but the buyer transacts in credits. Resemble AI goes finer still, billing per second of audio ($0.0005/sec TTS) rather than per minute.

Unit math: Carrying a 4-person, 30-minute video call on Daily is 4 × 30 = 120 participant-minutes; after the 10,000 free monthly minutes that is 120 × $0.004 = $0.48. Indexing a 60-minute video on Twelve Labs’ Marengo ($0.042/min) costs 60 × $0.042 = $2.52. A 1,000-minute/month outbound voice-agent campaign on Retell AI at ~~$0.115/min composed cost is 1,000 × $0.115 = $115. Staffing an AI nurse on Hippocratic AI for 40 active hours (~~$9/hr) is roughly $360 — versus ~$39/hour for a human RN.

Because the meter tracks duration, the same lever — commitment and volume — discounts it across vendors. Daily auto-discounts the participant-minute from $0.004 down to $0.0015 (about 63% off) at 50M+ minutes/month; Gladia’s Growth tier prepays for commitment-based volume discounts; Resemble AI offers enterprise volume discounts up to 80% off its per-second rate. This per-minute discounting is the substance of the voice-API minute-billing trend — see also the introduction to usage-based pricing for the broader frame, and the ElevenLabs pricing calculator to model a media-minute bill directly.

Companies using this

Forty companies in the corpus meter media minutes. They cluster into four groups: transcription and speech-to-text APIs (Deepgram, Speechmatics, Rev AI, Gladia, Sarvam AI, Otter.ai, Fireflies.ai); voice agents and real-time infrastructure (Retell AI, Vapi, Bland AI, Synthflow, PolyAI, Parloa, Daily, LiveKit, Hume AI, Hippocratic AI); text-to-speech and dubbing (ElevenLabs, Murf AI, Resemble AI, WellSaid Labs); and AI video generation and understanding (Synthesia, Tavus, Hedra, Twelve Labs, Luma AI, Pika, Opus Clip, VEED AI, InVideo AI, Kaiber, Wonder Dynamics, Creatify, Descript, Krisp, Kustomer, Fal).

Patterns observed

The minute is one unit, but the price encodes the work inside it. Twelve Labs makes it explicit within one product: Marengo indexing is $0.042/min while Pegasus analysis is $0.0292/min on the same footage. Hume AI ladders empathic voice from about $0.02/min to $0.072/min depending on model. Across the corpus the same sixty seconds spans over 600x (see the definition above); the duration is constant, and the per-minute rate is where product differentiation lives.
Voice agents have converged on the unbundled per-minute stack. Retell AI sets the template (infra + LLM + TTS + telephony, composing to $0.07–$0.31/min with “no platform fee” — the component rates are in the table above). Vapi mirrors it with $0.05/min hosting plus at-cost pass-through, and Synthflow advertises the same pay-only-for-the-composed-minute model. The transparency is itself a lever: the buyer can audit exactly which per-minute components they are paying for.
The metering base is not always the minute the buyer pictures. Daily meters per participant-minute, so a 4-way call bills four times a 1-way call of the same length, while Resemble AI bills per second and other vendors quote per hour (the display-unit games are detailed under “How it works”). Normalize to a single base — a stream-minute, a participant-minute, a second — before any cross-vendor comparison holds.
Video vendors hide the minute behind credits more often than audio vendors. Synthesia sells credits that convert to video minutes, Opus Clip sets roughly one credit per minute of processed video, and Pika, Luma AI, InVideo AI, VEED AI, and Leonardo.ai all bundle credit pools into subscription tiers that meter each generation. Pure transcription and infra APIs — Deepgram, Gladia, Daily, LiveKit — tend to quote the raw per-minute (or per-hour) rate without a credit layer. The further from a developer API and the closer to a creative tool, the more likely the minute is wrapped in prepaid credits.
Free minutes are the standard on-ramp, with enterprise holdouts. Daily gives 10,000 free participant-minutes/month, Gladia grants 10 free hours/month, Twelve Labs seeds a one-time 600 indexing minutes, and Retell AI starts every account with $10 in credits. The exceptions are sales-led enterprise voice: PolyAI and Hippocratic AI publish no self-serve tier at all, betting that a health system or contact center buying agent-minutes at scale is past the trial stage.

Counterexamples & variants

The most common variant is the vendor that generates speech but bills it by the character, not the minute. Deepgram’s Aura TTS is priced per 1,000 characters, Speechmatics TTS is per 1,000 characters, and Murf AI’s API is $0.01–$0.03 per 1,000 characters. These companies meter media minutes for transcription or agents but switch to per-character billing for synthesis, because the input to synthesis is text of known length, where the input to transcription is audio of unknown word count. The same vendor runs two units side by side, and only one of them is the minute.

PolyAI is the variant that proves the minute can be the meter without ever appearing on a price list. PolyAI publishes no public pricing and sells enterprise contact-center voice as a sales-led annual contract billed per-minute (varying by STT and voice configuration) plus one-time voice-design fees. Parloa works the same way — a /pricing path that 404s and a sales-led enterprise deal underneath. The connected minute is almost certainly the underlying cost driver, but the buyer negotiates a sales-led contract rather than reading a per-minute rate. Hippocratic AI is the same idea reframed as labor: ~$9 per agent-hour of active patient time, positioned as on-demand “AI staffing” you hire like a nursing shift rather than an API you meter.

WellSaid Labs sits at the opposite extreme: it produces per-minute media (AI voiceover) but bills largely by the seat rather than by minutes consumed. For a content team that generates voiceovers all day, a flat seat removes the per-minute anxiety entirely. Otter.ai and Fireflies.ai lean the same way — per-seat freemium subscriptions where transcription minutes are an allowance cap inside the plan (Otter meters a monthly transcription-minute limit per tier) rather than a metered line item. Descript and Krisp also lead with per-seat subscriptions and treat minutes as a bundled entitlement. In these cases media minutes exist in the taxonomy but are a usage grant, not the unit the buyer transacts in.

Finally, Kustomer and Fal show the minute as a secondary meter. Kustomer is a seat-priced CRM whose Voice channel is a pay-as-you-go add-on from $0.02/minute — the minute rides alongside seats and per-resolution AI charges, not as the headline. Fal is a generative-media GPU platform that bills most models per output (per image, per video) but exposes per-second video rates that resolve to a media-minute unit. Even LiveKit stacks the minute as one meter among several — agent-session minutes, WebRTC media minutes, inference credits, telephony, and bandwidth all bill in parallel. The minute appears, but as one line among several rather than the spine of the pricing.

What this means for buyers vs vendors

For buyers

Estimate your monthly minutes before you compare rates — your bill is dominated by volume, not by the headline number. A team running 100,000 agent-minutes/month sees a real difference between Retell AI at ~$0.115/min composed cost ($11,500) and a cheaper bundled quote, but a team doing 500 minutes/month will barely feel it. Match the meter to the task: transcription is metered per minute of input audio, synthesis is usually per character of input text, and voice-agent minutes are an unbundled stack — so a “voice AI” quote needs to be split into infra, LLM, TTS, and telephony before you can compare it. Check the metering base — Daily bills per participant-minute (a 4-way call costs 4x), Resemble AI bills per second, and Gladia quotes per hour, so normalize everything to a single unit before comparing. Watch for the credit layer: when Synthesia, Opus Clip, or Luma AI sells you credits, convert them back to minutes (Opus Clip is roughly one credit per minute of video) so you are comparing minutes to minutes. And if you generate media all day, price the seat-based variant — Otter.ai, Fireflies.ai, and WellSaid Labs may beat any per-minute meter for high-volume teams. Use free minutes to run a real pilot: Daily’s 10,000 free minutes and Gladia’s 10 free hours are enough to load-test before you commit. See choosing the right usage metric and the introduction to usage-based pricing for the framing.

For vendors

The media minute is the most intuitive meter you can offer a speech or video buyer — they already think in call length and video duration — but it is also the most directly comparable, so your per-minute rate sits next to every competitor’s. Differentiate inside the minute rather than on it: split accuracy or speed tiers the way Twelve Labs separates Marengo indexing from Pegasus analysis, ladder a model range like Hume AI’s $0.02–$0.072/min empathic voice, or expose the unbundled stack transparently like Retell AI and Vapi so buyers self-select the rate that matches their need. Decide deliberately whether to expose the minute or wrap it: a developer API wins on a transparent per-minute (or per-hour) card (Daily, Gladia), while a creative tool can escape rate-card comparison by selling credits that convert to minutes (Synthesia, Pika, Luma AI). Use a free-minute allowance as the on-ramp — Daily’s 10,000 free minutes/month is a low-friction trial — unless your buyer is a sales-led enterprise past the experimentation stage, as PolyAI and Hippocratic AI bet. Whatever you choose, you need per-second attribution of media duration to a customer and a job — and for voice agents, attribution of every component of the composed minute — which is a heavier metering pipeline than counting requests; see tracking and metering usage events and billing cycles and invoicing.

Company	Product	Pricing model	Billing units	Free tier	Verified
Autodesk (Flow Studio, formerly Wonder Dynamics)	AI VFX automation platform (Flow Studio)	subscription freemium hybrid	credits media-minutes seats	Yes	2026-06-16
Bland AI	AI phone call automation platform — inbound and outbound voice agents at scale	hybrid pure-usage subscription	api-calls credits media-minutes	Yes	2026-05-29
Creatify	AI ad-creative platform — turns a product URL into video and image ads	hybrid freemium	credits seats media-minutes	Yes	2026-06-30
Daily	Real-time voice and video WebRTC APIs (Video SDK + Pipecat Cloud)	pure-usage	media-minutes api-calls	Yes	2026-07-14
Deepgram	Usage-based speech-to-text, text-to-speech, and voice agent APIs	pure-usage freemium	media-minutes tokens credits	Yes	2026-05-31
Descript	AI-powered audio and video editing	hybrid freemium	seats credits media-minutes	Yes	2026-05-31
ElevenLabs	Voice AI platform across ElevenCreative, ElevenAgents, and ElevenAPI	subscription pure-usage hybrid	characters credits media-minutes	Yes	2026-06-30
Fal	Generative-media inference platform — serverless per-output model APIs plus dedicated GPU compute	pure-usage	gpu-hours requests media-minutes	No	2026-06-01
Fireflies.ai	AI meeting notetaker & conversation intelligence	freemium seat-based seat-plus-usage	seats credits media-minutes	Yes	2026-06-15
Gladia	Speech-to-text & audio intelligence API	pure-usage freemium commitment	media-minutes requests	Yes	2026-06-09
Groq	GroqCloud — LPU-based ultra-low-latency inference API for Llama, GPT-OSS, Qwen, Whisper transcription, and Orpheus text-to-speech	pure-usage hybrid commitment	tokens requests api-calls	Yes	2026-07-14
Hedra	AI video, avatar, image, and audio generation platform (Hedra Studio + API)	subscription freemium	credits media-minutes characters	Yes	2026-06-04
Hippocratic AI	Safety-focused healthcare LLM — patient-facing AI agents for non-diagnostic clinical tasks	pure-usage	media-minutes interactions	No	2026-06-10
Hume AI	Empathic Voice Interface (EVI) + Octave TTS + expression-measurement APIs	hybrid freemium	media-minutes characters api-calls	Yes	2026-06-30
InVideo AI	Prompt/text-to-video AI generation (invideo AI)	freemium subscription hybrid	credits media-minutes seats	Yes	2026-06-11
Kaiber	Kaiber — AI video & animation creation (Superstudio, Canvas, Motion, Flipbook)	freemium subscription hybrid	credits media-minutes seats	No	2026-06-11
Krisp	AI noise-cancellation, meeting transcription/notes, call-center voice AI, and a developer Voice AI SDK	seat-based	seats storage-gb media-minutes	Yes	2026-06-04
Kustomer	AI-first CRM and customer-service platform unifying omnichannel support, automation, and AI agents	hybrid seat-based outcome-based	seats resolutions media-minutes	No	2026-06-07
Leonardo.ai	Leonardo.Ai — generative AI image, video and design platform (Canva-owned)	freemium subscription seat-plus-usage	credits seats media-minutes	Yes	2026-06-11
LiveKit	Open-source real-time (WebRTC) communications, LiveKit Cloud & Agents framework	hybrid freemium pure-usage	media-minutes credits bandwidth-gb	Yes	2026-06-30
Luma AI	Dream Machine — text/image-to-video, image and audio generation (plus Genie 3D)	subscription freemium hybrid	credits media-minutes seats	Yes	2026-06-11
MiniMax	Foundation models, Hailuo video & per-token API	pure-usage freemium	tokens seats credits	Yes	2026-06-11
Murf AI	AI voice / text-to-speech platform (Murf Studio app + Murf API)	subscription pure-usage freemium	media-minutes seats credits	Yes	2026-06-01
Opus Clip	OpusClip — AI long-form-to-short video repurposing and clip generation	freemium subscription hybrid	credits media-minutes seats	Yes	2026-06-11
Otter.ai	AI meeting transcription, notes & assistant	freemium subscription seat-based	seats media-minutes	Yes	2026-06-15
Parloa	Enterprise AI Agent Management Platform (AMP) for contact-center voice and chat automation	pure-usage	media-minutes resolutions	No	2026-06-07
Pika	Pika — AI text-to-video and image-to-video generation	freemium subscription hybrid	credits media-minutes seats	Yes	2026-06-11
PolyAI	Enterprise voice AI assistants for contact centers	hybrid commitment	media-minutes	No	2026-06-09
Reka AI	Natively multimodal models (Spark, Edge, Flash, Core) + Research & Vision APIs	pure-usage freemium	tokens api-calls requests	Yes	2026-06-11
Resemble AI	AI deepfake detection & watermarking + voice generation APIs	pure-usage	credits media-minutes seats	No	2026-07-14
Retell AI	Conversational voice-agent API platform	pure-usage hybrid	media-minutes messages seats	No	2026-07-14
Rev AI	Pay-as-you-go speech-to-text, transcription, and audio-intelligence APIs	pure-usage freemium	media-minutes credits api-calls	Yes	2026-06-04
Sarvam AI	Sovereign Indic LLM, speech & translation APIs	pure-usage freemium	tokens characters media-minutes	Yes	2026-06-11
Speechmatics	Speech-to-text and text-to-speech APIs with per-hour usage pricing	pure-usage freemium	media-minutes characters	Yes	2026-07-06
Synthesia	Enterprise AI video generation	subscription freemium	credits media-minutes seats	Yes	2026-05-31
Synthflow AI	No-code AI voice-agent builder	hybrid	media-minutes seats	No	2026-06-24
Tavus	Conversational Video Interface (CVI) API for real-time AI humans / avatars, plus PALs consumer AI companions	hybrid freemium	media-minutes	Yes	2026-06-24
Twelve Labs	Video understanding foundation models (Marengo for search/embeddings, Pegasus for analysis) delivered as a usage-metered API	pure-usage freemium commitment	media-minutes tokens requests	Yes	2026-06-02
Vapi	Voice AI infrastructure for developers	pure-usage hybrid	media-minutes messages seats	No	2026-06-09
VEED AI	VEED — online video editor with AI generation tools	subscription seat-based freemium	seats credits media-minutes	Yes	2026-06-11
WellSaid Labs	AI text-to-speech voiceover studio with 280+ voices for content teams	seat-based freemium	seats media-minutes	Yes	2026-06-24

Explore this theme in the knowledge graph

FAQ

What is media-minute pricing?

Media-minute pricing is a billing unit where customers are charged per minute of audio or video processed. It is the native meter for speech-to-text, text-to-speech, voice agents, real-time video, and AI video generation, because the duration of the media maps directly to the compute cost of generating or transcribing it.

How much does it cost to transcribe a minute of audio?

Machine transcription is cheap and varies by model. In this corpus Gladia's pay-as-you-go STT is $0.144/hour (about $0.0024/min), Twelve Labs indexes video at $0.042/min, and Daily's raw WebRTC audio starts at $0.004/participant-minute. Human transcription is far more expensive — Rev AI lists it around $1.99/min through the same API.

Why do speech and video vendors bill per minute instead of per token?

Audio and video have no natural token boundary, but they do have a duration. A minute of speech or footage is a stable, intuitive unit that buyers can estimate from call logs or video length, and it tracks the underlying compute closely. Vendors like Twelve Labs, Tavus, and Daily meter video by the minute for the same reason transcription vendors meter audio by the minute.

How is a voice-agent minute priced?

Modern voice-agent vendors bill an unbundled per-minute stack. Retell AI charges $0.055/min for Voice Infra plus your chosen LLM, TTS ($0.015–$0.04/min), and telephony ($0.015/min), reaching $0.07–$0.31/min. Vapi charges $0.05/min hosting with at-cost pass-through of provider costs, and Synthflow advertises no platform fee on top of the composed rate.

Do per-minute vendors offer free minutes?

Most do. Daily includes 10,000 free participant-minutes per month, Gladia gives 10 free hours per month, Twelve Labs grants a one-time 600 free indexing minutes, and Retell AI seeds accounts with $10 in free credits. Enterprise voice vendors like PolyAI and Hippocratic AI are exceptions — they are sales-led with no free self-serve tier.

Which companies use media-minute pricing?

In this corpus 40 companies meter media minutes, including transcription APIs (Deepgram, Speechmatics, Rev AI, Gladia), voice agents (Retell AI, Vapi, Bland AI, Synthflow, PolyAI), real-time media infrastructure (LiveKit, Daily), text-to-speech and dubbing (ElevenLabs, Murf AI, Resemble AI, WellSaid Labs), and AI video (Synthesia, Tavus, Twelve Labs, Luma AI, Pika, Opus Clip, VEED).

Related billing units

Back to companies