All companies
technology

AssemblyAI pricing

assemblyai.com facts checked analysis reviewed
Quick summary
Pricing model
Billing units
Product segment
Region
Product
Speech-to-Text & Audio AI APIs
Industry
technology
Commits
None
In this page
AI Summary
  • AssemblyAI operates a pure usage-based pricing model billed per hour of audio: async transcription at $0.15/hr (Universal-2) or $0.21/hr (Universal-3 Pro), with no seat fees, no monthly minimums, and a free tier of up to 185 hours to test before committing.
  • Async transcription has two live models — Universal-2 ($0.15/hr) and the more accurate Universal-3 Pro ($0.21/hr); real-time streaming is published at $0.15/hr (Universal-Streaming), $0.30/hr (Whisper-Streaming), and $0.45/hr (Universal-3 Pro Streaming).
  • Speech Understanding add-ons — including speaker identification, entity detection, translation, and PII redaction — each layer an additional per-hour fee on top of the base transcription rate.
  • The LLM Gateway lets developers run frontier LLMs (OpenAI, Anthropic, Google) against a transcript, billed per million input and output tokens separately from transcription — the productized evolution of LeMUR.
  • Enterprise customers get volume discounts, dedicated support, and custom contract pricing; the self-serve path is designed for developers who can start on the free tier and scale without a sales call.
  • AssemblyAI raised $50M in its Series C (January 2024) from Accel and Insight Partners, bringing total funding to ~$143M, and processes audio for hundreds of enterprise customers including dozens of Fortune 500 companies.
Pricing summary
AssemblyAI 2026 — Usage-based Speech AI pricing
No monthly minimums: async from $0.15/hr, streaming from $0.15/hr, add-ons per hour, LLM Gateway per token; Enterprise custom
Free Tier
Free
Developers evaluating the API
Universal-3 Pro
$0.21 /hr audio
Teams needing the highest multilingual accuracy
Enterprise
Custom
Fortune 500 and high-volume platforms
Real-Time Streaming
From $0.15 /hr
Live captions, voice agents, call centers
Speech Understanding Add-ons
Per feature /hr
Developers needing structured audio analysis
Async transcription billed per hour: Universal-2 $0.15/hr, Universal-3 Pro $0.21/hr. Streaming from $0.15/hr. Speech Understanding features add incremental per-hour fees. LLM Gateway billed per input/output token. Enterprise pricing requires a sales conversation for volume commitments.

About

AssemblyAI is a San Francisco-based AI company founded in 2017 that builds Speech AI APIs for developers and enterprises. The company’s core product is a suite of APIs that convert audio and video to text and extract structured intelligence from the resulting transcripts. Unlike general-purpose AI platforms, AssemblyAI is purpose-built for audio: every model, feature, and pricing dimension is designed around the economics of processing spoken language.

AssemblyAI’s customer base spans startups and Fortune 500 enterprises — the company has reported processing audio for hundreds of enterprise customers, including dozens in the Fortune 500. Customers include companies building products across call center analytics, meeting transcription, media captioning, voice agent platforms, and content intelligence.

The company raised a $50M Series C in January 2024 led by Accel with participation from Insight Partners, bringing total disclosed funding to approximately $143M. That round followed the November 2023 launch of Universal-1 — the company’s highest-accuracy English transcription model at the time — and positioned AssemblyAI as the leading independent speech AI infrastructure provider for developers, competing with Deepgram, Google Cloud Speech-to-Text, AWS Transcribe, and Microsoft Azure Speech.

AssemblyAI’s product suite now covers three layers: core transcription (async and real-time), Audio Intelligence (structured analysis of the transcript), and LeMUR (an LLM reasoning layer applied directly to audio context). Each layer is priced separately and composably, letting developers pay only for the capabilities they use.


Pricing summary : pure per-second billing, no monthly seat fees

AssemblyAI runs a pure usage-based pricing model. There are no subscription tiers, no seat fees, and no monthly minimums for self-serve customers. You pay for what you process: async transcription is billed per hour at $0.15/hr (Universal-2) or $0.21/hr (Universal-3 Pro), with incremental per-hour add-on fees for each Speech Understanding feature you enable, and separate per-token billing for the LLM Gateway.

This model mirrors how usage-based pricing works in cloud infrastructure — the bill expands proportionally with consumption, making it friendly for startups with variable audio volumes and potentially expensive for teams that underestimate usage. Unlike the flat-rate subscription models used by many AI tools (see the AI pricing shift away from per-user licenses), AssemblyAI’s pricing scales linearly with every hour of audio processed.

What makes this different: Most speech API vendors charge a single flat rate that hides model quality trade-offs. AssemblyAI separates model accuracy (Universal-2 vs. Universal-3 Pro) from feature add-ons (speaker ID, entities, translation, PII redaction), giving developers granular control over the cost/capability trade-off. Stacking several Speech Understanding features can meaningfully raise the effective per-hour rate — a cost dynamic that is not obvious from the headline pricing.


Pricing by product

Core Transcription (Async / Pre-recorded)

ModelPrice per hourKey mechanics
Universal-3 Pro$0.21Most accurate model; leads on multilingual WER, entities, rare words. English, Spanish, German, French, Italian, Portuguese
Universal-2$0.15Excellent accuracy at a lower price; supports 99 languages; trained on 12.5M+ hours

SLAM-1 is deprecated — AssemblyAI directs customers to migrate to Universal-3 Pro.

Real-Time Streaming

ModelPrice per hourUse case
Universal-Streaming (English / Multilingual)$0.15Cost-effective real-time captions and voice agents
Whisper-Streaming$0.30Whisper-based streaming option
Universal-3 Pro Streaming$0.45Highest-accuracy real-time transcription

Streaming is billed by WebSocket session duration (idle time counts). Concurrency scales automatically at no additional fee.

Speech Understanding Add-ons (per-hour on top of base)

FeatureAdd-on pricingNotes
Speaker identification+$0.02/hrReplaces “Speaker A/B” labels with real names or roles
Translation+$0.06/hrTranslates the transcript into a target language
Entity detection+$0.08/hrNamed entity recognition (people, orgs, places, dates)
PII text redaction+$0.08/hrRedacts sensitive info from transcript text
Speaker diarizationIncludedDetects “who said what” for pre-recorded and streaming

LLM Gateway (LLM-over-audio layer)

DimensionPricingNotes
Input tokensPer 1M tokensTranscript + prompt tokens fed into the selected LLM
Output tokensPer 1M tokensLLM-generated answer tokens
ModelsOpenAI, Anthropic, GoogleE.g. GPT-5.1 at $1.25 in / $10.00 out per 1M tokens; rates vary by model

Sales motions across products: Self-serve / PLG for all tiers via API key; Enterprise sales-led for volume contracts and invoice billing. Speech Understanding and the LLM Gateway are fully self-serve — no sales call required to enable.


Hidden costs : what surprises buyers when the bill arrives

Archetype A: Developer building a meeting transcription app

A developer processing 100 hours/month of meeting recordings on Universal-2, adding speaker identification and entity detection:

Line itemPer-hour costMonthly (100 hrs)
Base async transcription (Universal-2)approximately $0.15approximately $15.00
Speaker identification add-onapproximately $0.02approximately $2.00
Entity detection add-onapproximately $0.08approximately $8.00
Estimated totalapproximately $0.25/hrapproximately $25/mo

The base headline rate of $0.15/hr is only the starting point. Common add-ons raise the effective rate meaningfully. Developers often build against the base rate and are surprised when the enriched transcript bill arrives. (Choosing Universal-3 Pro at $0.21/hr instead of Universal-2 adds roughly 40% to the base line item.)

Archetype B: Call center analytics platform at scale

A B2B SaaS company processing 2,000 hours/month of customer support calls on Universal-3 Pro with PII redaction, entity detection, and translation:

Line itemPer-hour costMonthly (2,000 hrs)
Base async transcription (Universal-3 Pro)approximately $0.21approximately $420
PII text redactionapproximately $0.08approximately $160
Entity detectionapproximately $0.08approximately $160
Translationapproximately $0.06approximately $120
LLM Gateway summarization (per-token, varies by model)variesvaries
Estimated totalapproximately $0.43/hrapproximately $860/mo

At higher volumes, the self-serve rate becomes worth an enterprise conversation. AssemblyAI’s sales team engages companies at high monthly volumes to offer volume pricing.

Want to model your own AssemblyAI spend? Use the AssemblyAI pricing calculator to estimate costs based on your audio volume, model selection, and feature mix.


Pricing evolution : how AssemblyAI’s pricing has changed since 2021

Cadence

QuarterPrice changesProduct / SKU additionsNotes
2021 Q211Public API launched with pay-per-second billing; Series A raised
2022 Q402Series B raised; Audio Intelligence add-ons (sentiment, entity, IAB) launched
2023 Q201LeMUR launched in beta — token-based LLM-over-audio pricing introduced
2023 Q401Universal-1 launched — same pricing tier, higher accuracy model
2024 Q101Series C raised; Universal-2 released with improved benchmarks
2025 Q102Universal-3-Pro (async) and U3-RT-Pro (real-time streaming) launched
2026 Q200Current pricing stable as of May 2026 research

Tracked range: 2021 Q2–2026 Q2. Quarters not listed above were verified stable (0 price changes, 0 SKU additions). Historical pricing pre-2021 was invite-only and not publicly documented.

Notable changes

  • 2021 Q2 — Public pay-per-second billing launched; first developer self-serve access. Pricing set at $0.00025/second for standard transcription.
  • 2022 Q4 — Audio Intelligence feature suite launched: sentiment analysis, entity detection, IAB topic classification, and content safety added as incremental per-second fees on top of base transcription.
  • 2023 Q2 — LeMUR announced at beta: the first commercially-available LLM-over-audio API. Token-based pricing (input + output) introduced as a third billing dimension separate from transcription and add-ons.
  • 2023 Q4 — Universal-1 released as the new highest-accuracy English STT model. Positioned as the same pricing tier as prior models but with significantly lower word error rate.
  • 2024 Q1 — Universal-2 released following $50M Series C. Universal-2 achieved further accuracy gains over Universal-1 on standard English benchmarks. No pricing increase; same per-second rate.
  • 2025 Q1 — Universal-3-Pro (async) and U3-RT-Pro (real-time streaming) launched. Streaming priced at a premium to async to reflect lower-latency infrastructure costs.

What’s unique : differentiators in AssemblyAI’s pricing approach

1. Per-second granularity with no rounding penalty. AssemblyAI bills at the per-second level — a 90-second clip costs exactly $0.0225, not $0.03 (rounded to the minute). This is technically obvious but commercially significant: early speech API providers (and even current cloud incumbents) round up to the nearest 15 seconds or full minute. For developers processing large volumes of short clips — voicemails, social posts, support snippets — per-second billing can reduce costs by 30–50% versus per-minute rounding. See how usage metric design affects developer costs for why this matters in tool selection.

2. Composable Audio Intelligence: pay for features, not tiers. Unlike SaaS tools that gate feature sets behind plan tiers, AssemblyAI lets developers enable any combination of Audio Intelligence features per-request. Sentiment analysis on one request, speaker diarization + entity detection on another — each billed independently. This composable feature billing allows developers to precisely control costs and avoids paying for analysis they don’t need. It mirrors how AWS charges for individual cloud services rather than bundled “plans.”

3. LeMUR: the first token-billed audio LLM API. When LeMUR launched in 2023, it introduced a novel billing layer to the speech category: token-based LLM pricing applied to audio context. Developers could ask natural-language questions about a transcript — “summarize the action items from this meeting” — and pay per token for the answer. This created a new cost dimension that no other speech API offered, and it mirrors the outcome-based pricing trend where customers pay for derived value (the answer) rather than raw processing (the transcript).

4. Free playground before payment commitment. AssemblyAI’s dashboard playground lets developers test every model — including LeMUR — with real audio before entering any payment details. In a category where competitors often require API key purchase to begin testing, this friction-free evaluation path is a meaningful PLG differentiator. It reflects the shift toward product-led growth in AI infrastructure where developer trust is won at the keyboard before the wallet.

5. Model accuracy as a pricing anchor, not a pricing gate. AssemblyAI positions its model improvements (Universal-1 → Universal-2 → Universal-3-Pro) at the same pricing tier rather than charging premium rates for higher-accuracy models. This is the opposite of the tiered-model pricing strategy used by OpenAI (GPT-4 costs more than GPT-3.5) or Anthropic (Opus costs more than Haiku). AssemblyAI’s approach bets that accuracy leadership drives adoption volume, and volume drives enterprise upsell — the premium is captured at the contract level, not the per-token level.


Strengths & weaknesses

StrengthsWeaknesses
Per-second billing with no rounding eliminates penalty for short-clip processingMultiple billing dimensions (transcription + per-feature add-ons + LeMUR tokens) make total cost hard to predict without calculator
Free playground with no payment details required — lowest friction evaluation in the categoryNo published pricing for Audio Intelligence add-on rates; requires account creation to see full pricing
Composable Audio Intelligence: enable exactly the features you need per requestReal-time streaming priced at a premium; no published per-second rate makes streaming cost estimation opaque
Model improvements (Universal-1 → Universal-2 → Universal-3-Pro) at the same price tierEnterprise pricing entirely opaque; sales process required for any volume discount
LeMUR uniquely enables LLM reasoning on audio without building a custom pipelineNo spend caps or budget alerts on self-serve accounts — surprise bills possible for high-volume batch jobs
Genuine accuracy leadership on English benchmarks vs. Whisper, Deepgram, Google STTPricing for non-English languages not prominently documented; accuracy benchmarks primarily cover English

Billing UX : developer experience with AssemblyAI’s billing controls

  • Self-serve account creation — API key and billing setup handled entirely in the AssemblyAI dashboard. No sales call required to begin processing audio at any volume.
  • Usage dashboard — The developer dashboard shows API usage history, request counts, and cost breakdown by feature. Granular enough to identify which Audio Intelligence add-ons are driving costs.
  • No spend caps on self-serve — AssemblyAI does not currently offer configurable spending limits for self-serve accounts. A batch job that processes unexpectedly large audio volumes will bill without notification. This is a known friction point for cost-sensitive developers.
  • Playground — no payment required — The in-dashboard playground allows testing of all models, Audio Intelligence features, and LeMUR with real audio before any billing is set up. This is the most developer-friendly evaluation UX in the category.
  • Payment methods — Credit card for self-serve accounts. Enterprise accounts add invoice billing, purchase orders, and custom contract payment terms.
  • Billing granularity — Bills are itemized by API call type (transcription, Audio Intelligence feature, LeMUR). The bill shows the exact seconds processed and features enabled per request.
  • API key management — Multiple API keys can be created per account for separating environments (dev, staging, production) with independent usage tracking.
  • Enterprise account management — Enterprise customers receive a dedicated account manager, custom usage reporting, and quarterly business reviews with usage forecasting support.

Strategic wins : where AssemblyAI’s pricing decisions have paid off

1. Per-second billing made AssemblyAI the default choice for short-clip use cases

By charging at the per-second level rather than rounding to the minute, AssemblyAI structurally won the economics for developers processing short audio — voicemails (20–60 seconds), social media clips (15–60 seconds), podcast excerpts (30–90 seconds). A company processing 1 million 30-second clips per month pays AssemblyAI $7,500 versus $15,000 at a per-minute-rounded competitor — the same audio at 2× the cost. This per-second value metric is the single pricing decision that most clearly explains AssemblyAI’s developer adoption curve in media and social application categories.

2. Composable Audio Intelligence created a flywheel of feature adoption without tier lock-in

By pricing Audio Intelligence as per-request add-ons rather than plan tiers, AssemblyAI gave developers the freedom to start with base transcription and incrementally adopt higher-value features as product needs evolved. A developer who starts with basic transcription at $0.015/min naturally discovers speaker diarization when their users ask “who said what?” — and enables it at marginal cost. This composable model drives organic feature expansion that would not happen if features were gated behind fixed tiers requiring a plan upgrade. The result: higher feature adoption rates than a tier-gated model would produce, and higher average revenue per customer over time.

3. LeMUR differentiated AssemblyAI beyond the “just transcription” category

The 2023 launch of LeMUR moved AssemblyAI from being a transcription API vendor to being an audio intelligence platform — a category with significantly higher defensibility and pricing power. Before LeMUR, the main competitive variables were accuracy and price per minute. After LeMUR, AssemblyAI offered a capability that no other speech API could replicate: LLM-quality reasoning applied directly to audio content, without requiring a customer to build their own transcript → LLM pipeline. This outcome-based value layer — “what does this meeting mean?” rather than “give me the words” — justified enterprise conversations that raw transcription pricing alone could not support.

4. Accuracy leadership at parity pricing created switching cost without raising rates

AssemblyAI’s model release cadence (Universal-1 → Universal-2 → Universal-3-Pro) at unchanged per-second pricing created a powerful retention mechanism: customers who switched to AssemblyAI for Universal-2’s accuracy gains would be irrational to leave when Universal-3-Pro launches at the same rate. This is the inverse of the pricing strategy used by most AI model companies, where new model generations come with price increases. By keeping rates flat while improving accuracy, AssemblyAI accumulates a technical switching cost — the customer’s application is tuned to AssemblyAI’s output format, API behavior, and accuracy characteristics — without imposing a financial switching cost that might prompt re-evaluation. See how AI companies use model improvements as retention tools for the broader pattern.


Areas to improve : gaps and friction in AssemblyAI’s pricing approach

1. Audio Intelligence add-on pricing is not publicly documented without an account

AssemblyAI’s pricing page prominently shows the $0.00025/second base transcription rate but does not publish per-feature pricing for Audio Intelligence add-ons or LeMUR without logging in and accessing the documentation. Developers trying to build a total cost model before signing up — a normal procurement step for any enterprise buyer — cannot do so without creating an account. This transparency gap is a meaningful friction point for enterprise sales cycles. Publishing a full pricing table (transcription + every add-on rate + LeMUR input/output token prices) on the public pricing page would reduce pre-sales friction and let developers self-qualify their budget fit before a sales call. Compare this approach against Perplexity AI’s fully public API pricing, which publishes all token and request rates transparently.

2. No spend caps create bill shock risk for self-serve developers

AssemblyAI’s self-serve accounts do not support configurable spending limits or threshold alerts. A developer who accidentally submits a batch of 10,000 long audio files will receive the bill without any real-time warning. AWS, Google Cloud, and Azure all offer budget alerts and spending caps as standard account management features — AssemblyAI’s absence of these controls is a category gap that creates anxiety for cost-sensitive development teams. Adding a simple “alert me when my monthly bill exceeds $X” setting would reduce developer anxiety and likely increase API adoption from teams that are currently cautious about unexpected charges.

3. Streaming pricing opacity creates hesitation for real-time use cases

Real-time streaming (U3-RT-Pro) is priced at a premium over async, but the specific rate premium is not prominently documented. Developers evaluating whether to build a real-time captioning product versus a post-call analytics product cannot easily model the cost difference. Streaming use cases — live captions, voice agents, real-time call monitoring — are among the highest-growth segments in the audio AI market. Clearer, published streaming pricing would likely accelerate adoption in these segments rather than causing developers to default to cheaper async alternatives. See billing cycles and metering for usage-based APIs for why transparent streaming cost documentation is critical to developer confidence.


Key takeaways

  1. Per-second billing is a competitive moat in short-clip categories. AssemblyAI’s per-second granularity systematically halves costs for developers processing sub-60-second audio compared to per-minute-rounded competitors. For any AI API, the choice of billing unit (per second, per minute, per request, per token) is a strategic decision that shapes which use cases become economically viable on your platform.

  2. Composable feature pricing beats tier gating for developer adoption. By making Audio Intelligence add-ons opt-in per request rather than bundled into fixed tiers, AssemblyAI drives organic feature adoption as developer products mature. Developers discover features at the point of need, not at the point of plan selection — which is an earlier and lower-intent moment in the product lifecycle.

  3. Model accuracy improvements at flat pricing create powerful retention without visible lock-in. Releasing Universal-2 and Universal-3-Pro at the same per-second rate as Universal-1 builds technical switching cost (tuned prompts, output parsing, latency expectations) without the financial switching cost that prompts re-evaluation. This is a sustainable retention mechanic that subscription-based tools with static feature sets cannot easily replicate.

  4. A free playground with no payment commitment is the highest-leverage PLG investment for API products. AssemblyAI’s no-card-required playground gives every curious developer a zero-friction path to experience the product quality. In a category with strong alternatives (Google STT, AWS Transcribe, Deepgram), developer experience before purchase is often the decisive factor.

  5. Adding LLM reasoning as a billing layer elevated the pricing conversation from commodity to platform. LeMUR transformed AssemblyAI from a transcription API (priced per minute of audio) to an audio intelligence platform (priced for the value of insights derived from audio). This layered value architecture — raw processing + structured analysis + LLM reasoning — is a model for how audio AI, and AI APIs broadly, will expand pricing power as capabilities mature.


UBP implications

  1. Multi-dimensional usage billing (transcription + features + tokens) is the emerging standard for audio AI. AssemblyAI’s three-layer billing model — base per-second transcription, per-feature Audio Intelligence add-ons, and per-token LeMUR — represents the most granular usage billing in the speech API category. As audio AI capabilities compound (transcription → analysis → reasoning), each new capability layer will carry its own billing dimension. Teams building usage aggregation systems for audio AI products need to account for multi-dimensional metering from the start, not just per-minute billing.

  2. Transparent public pricing accelerates developer self-qualification and shortens sales cycles. AssemblyAI’s partially-opaque add-on pricing forces enterprise buyers into sales conversations earlier than necessary. The UBP lesson: for developer-targeted products, full public pricing transparency reduces friction faster than a concierge sales motion. Developers who can self-model their total cost are more likely to sign up without a sales call — and developers who sign up without a sales call convert to paying customers faster and at lower CAC.

  3. Accuracy improvements at flat pricing are a usage-based growth strategy in disguise. When AssemblyAI releases Universal-3-Pro at the same rate as Universal-2, existing customers don’t churn — they simply produce better outputs at the same cost, which makes their products better, which drives more audio volume through AssemblyAI’s infrastructure. Higher accuracy → better customer products → more usage volume → more revenue at unchanged per-unit price. This usage-led expansion mechanic is a UBP growth pattern that pure subscription models cannot replicate.


Sources


Bottom line

AssemblyAI has built the most developer-friendly speech AI pricing structure in the market: pay-per-second with no rounding, composable Audio Intelligence add-ons, and a free playground that requires no payment commitment. Its model release cadence — Universal-1, Universal-2, Universal-3-Pro — at flat per-second rates is a quiet retention machine that builds technical switching cost without triggering financial re-evaluation. The key gaps — opaque add-on pricing, no spend caps, streaming rate ambiguity — are fixable and do not undermine the core economic model. With $143M raised, Fortune 500 enterprise penetration, and LeMUR as a differentiated platform layer, AssemblyAI is the clear default choice for developers building audio intelligence into production products.

Compare AssemblyAI with other AI infrastructure providers in the full pricing blueprint.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Current Pricing (May 2026)

Current pricing: async transcription at $0.15/hr (Universal-2) and $0.21/hr (Universal-3 Pro); streaming at $0.15/hr (Universal-Streaming), $0.30/hr (Whisper-Streaming), and $0.45/hr (Universal-3 Pro Streaming). Speech Understanding add-ons billed per hour; LLM Gateway billed per million tokens. Free tier of up to 185 pre-recorded hours.

Current Pricing (May 2026) screenshot 1
Current Pricing (May 2026) screenshot 2

Universal-3 Pro and Streaming Models Released

Universal-3 Pro released as the most accurate async model; Universal-3 Pro Streaming joined Universal-Streaming and Whisper-Streaming for real-time use. Pricing moved to a per-hour structure across models.

Series C ($50M) — Universal-2 Released

Series C funding ($50M) led by Accel, bringing total funding to ~$143M. Universal-2 released, achieving further accuracy gains. Company reports processing audio for hundreds of enterprise customers including dozens of Fortune 500 companies.

Universal-1 Model Released

Universal-1 launched as AssemblyAI's flagship model for highest-accuracy English transcription, with accuracy benchmarks beating Whisper large-v3 and Google STT v2.

LeMUR Beta — LLM-over-Audio Layer

LeMUR launched in beta — the first LLM-over-audio layer in the speech API category. LeMUR adds token-based billing on top of transcription costs.

Series B ($72M) — Audio Intelligence Add-ons

Series B funding ($72M) led by Insight Partners. Usage-based API pricing at $0.00025/second confirmed for standard transcription. Audio Intelligence add-ons (sentiment, entities, IAB topics) launched as incremental per-second fees.

Series A ($28M) — Public API Launch

Series A funding ($28M) from Insight Partners. Public API access expanded; pay-per-second billing model launched publicly.

AssemblyAI Founded

AssemblyAI founded in San Francisco as a speech-to-text API startup. Early pricing was invite-only for select beta customers.

Trivia
  • · AssemblyAI bills transcription by the hour of audio processed — Universal-2 at $0.15/hr and the more accurate Universal-3 Pro at $0.21/hr — with no minimum commitment, upfront fee, or contract on the pay-as-you-go plan.
  • · AssemblyAI's LLM Gateway lets developers call frontier models (OpenAI, Anthropic, Google) directly against a transcript, billed per million input and output tokens — the evolution of what AssemblyAI first shipped as LeMUR, its 'LLM-over-audio' layer.
  • · AssemblyAI raised $50M in its Series C in January 2024, bringing total funding to approximately $143M. The round was led by Accel, with participation from Insight Partners, and came just two months after Universal-1 launched as the company's flagship accuracy benchmark.

Questions & answers

How much does AssemblyAI cost per hour of audio?
AssemblyAI bills async transcription by the hour: Universal-2 is $0.15/hr and the more accurate Universal-3 Pro is $0.21/hr. Real-time streaming ranges from $0.15/hr (Universal-Streaming) to $0.45/hr (Universal-3 Pro Streaming). Speech Understanding add-ons each carry an additional per-hour fee on top of the base rate.
Does AssemblyAI have a free tier?
Yes. You can create an account and start transcribing immediately with no credit card required. The free tier includes up to 185 hours of pre-recorded transcription and up to 333 hours of streaming. There is no monthly minimum — beyond the free tier you only pay for what you process.
What is the LLM Gateway and how is it priced?
The LLM Gateway is AssemblyAI's layer that lets developers run frontier LLMs from OpenAI, Anthropic, and Google directly against a transcript — summarization, Q&A, custom prompts. It is billed per million input and output tokens at each model's published rate, separately from transcription costs. It is the productized evolution of LeMUR.
What is the difference between AssemblyAI's Universal-2 and Universal-3 Pro models?
Universal-2 is the lower-cost async model at $0.15/hr, supporting 99 languages with excellent accuracy. Universal-3 Pro is the most accurate model at $0.21/hr, leading on multilingual word error rate. For real-time use, AssemblyAI offers Universal-Streaming, Whisper-Streaming, and Universal-3 Pro Streaming. (The older SLAM-1 model is deprecated.)
Does AssemblyAI offer enterprise pricing?
Yes. Enterprise customers receive custom volume-discount pricing, dedicated support, and invoice billing, and AssemblyAI is available via the AWS Marketplace. Self-serve customers pay standard per-hour rates with no sales contact required.
How do Speech Understanding add-ons affect AssemblyAI's pricing?
Each Speech Understanding feature — speaker identification, entity detection, translation, PII redaction, and more — adds an incremental per-hour fee on top of base transcription. Enabling multiple features stacks those per-hour fees, so a heavily-featured transcript costs more than the base transcription rate.