AI Summary
About
Speechmatics is a Cambridge, UK speech AI company — founded in 2006 as Cantab Research by speech-recognition researcher Dr. Tony Robinson — that sells automatic speech recognition (ASR / speech-to-text) and text-to-speech (TTS) as developer-facing APIs. It raised a $62M Series B in June 2022 led by Susquehanna Growth Equity (with AlbionVC and IQ Capital), runs 100–250 staff, and reported ~£11.3M revenue in 2021. Its models cover 56+ languages for transcription (with 69 language pairs for AI translation) and emphasise accuracy across accents and dialects — the company markets reaching “over 4 billion people” through its language and accent coverage. The product line spans batch and real-time speech-to-text (powered by its Ursa generation of GPU-scaled models, launched March 2023), the Flow voice-agent API (2024), a low-latency text-to-speech engine, and a set of speech-to-text “bolt-ons” (translation, summaries, chapters, sentiment, topics).
The buyer is primarily a developer or product team building voice products — contact-center analytics, media captioning, medical and legal transcription, note-taking assistants, and real-time voice agents. Speechmatics positions itself directly against Deepgram and AssemblyAI (it publishes head-to-head comparison pages for both) on the axis of transcription accuracy and language breadth.
Pricing is split across three tiers — a free tier for exploration, a self-serve pay-as-you-go Pro tier, and a sales-led Enterprise tier with custom volume discounts and on-premises/private-cloud deployment. A Startup Program offers up to $50,000 in usage credits to early-stage founders, capped at roughly 20 startups per cohort.
Pricing summary : How Speechmatics meters speech-to-text and text-to-speech by usage
Speechmatics uses a pay-as-you-go usage model with a free monthly allowance, billed across three independent usage dimensions plus a custom Enterprise tier:
- Speech-to-text (per hour): On the Pro tier, batch and real-time standard accuracy are $0.24/hr, batch enhanced accuracy is $0.40/hr, and real-time enhanced accuracy is $0.56/hr. Usage is metered to the second and billed at the per-hour rate. (The page also exposes a “Minutes” unit toggle that re-expresses the same rates per minute.)
- Text-to-speech (per character): $0.011 per 1,000 characters on Pro, metered separately from speech-to-text.
- Speech-to-text bolt-ons (per hour): Translation $0.65/hr, Summaries $0.12/hr, Chapters $0.40/hr, Sentiment $0.12/hr, Topics $0.20/hr — added on top of the underlying transcription rate.
- Free allowance: Every account gets 2,400 free STT minutes/month (1,200 real-time + 1,200 batch) and 1 million free TTS characters (~20 hrs) before any charges begin.
Volume discounts apply automatically: Pro usage above 500 hours/month for a given speech-to-text type is discounted 20%, with further discounts from 24,000 hours/year. Enterprise pricing is entirely custom (“volume discounts available”). Enabling Model Training (sharing anonymized data) earns a 33% usage discount.
What makes this different: Speechmatics prices a pure-usage API by the hour of audio rather than by tokens or API calls, separates “standard” and “enhanced” accuracy into distinct per-hour SKUs, and folds a generous free allowance into both the Free and Pro tiers so paid customers keep the same monthly freebie. It is a textbook freemium pricing model wrapped around a self-serve sales motion, with sales-led Enterprise as the only gated layer.
Pricing by product
Speech-to-Text (per-hour usage rates, Pro tier)
| Tier | Price | Included | Key mechanics |
|---|---|---|---|
| Free | $0 | 2,400 STT minutes/mo (1,200 real-time + 1,200 batch); 2 concurrent real-time sessions | No credit card; 8 hrs free/month to try |
| Pro | from $0.24 / hr | Same free monthly allowance, then metered per hour; 50 concurrent real-time sessions; 10 file jobs/sec | Pay-as-you-go, no commitment; billed to the second |
| Enterprise | Custom | Unlimited scale, no rate limits, custom models, multi-region cloud | Sales-led; volume discounts; on-prem / on-device |
Speech-to-Text — Pro per-hour accuracy rates
| SKU | Price | Notes |
|---|---|---|
| Batch standard accuracy | $0.24 / hr | Cost-control / turnaround model |
| Batch enhanced accuracy | $0.40 / hr | Best-in-class accuracy |
| Real-time standard accuracy | $0.24 / hr | Standard model (no turnaround benefit in real-time) |
| Real-time enhanced accuracy | $0.56 / hr | Highest-accuracy real-time |
| Volume discount | 20% off | Automatic above 500 hr/month per STT type |
Speech-to-Text bolt-ons (Pro, per hour)
| Bolt-on | Price |
|---|---|
| Translation | $0.65 / hr |
| Summaries | $0.12 / hr |
| Chapters | $0.40 / hr |
| Sentiment | $0.12 / hr |
| Topics | $0.20 / hr |
Text-to-Speech (per-character usage)
| Tier | Price | Included | Key mechanics |
|---|---|---|---|
| Free | $0 | 1M characters/mo (~20 hrs); low-latency | English (more languages coming) |
| Pro | $0.011 / 1k chars | Same 1M free characters, then metered | Metered separately from speech-to-text |
| Enterprise | Custom | On-prem TTS, custom voice development | Sales-led, quoted |
Sales motions across products: PLG / self-serve for the Free and Pro tiers (sign up and pay online, no commitment); sales-led for Enterprise (custom volume discounts, on-prem deployment, dedicated CSM).
Hidden costs : What real speech-to-text bills look like at volume
The headline $0.24/hr makes Speechmatics look almost free, but the real bill depends on which accuracy SKU you pick, which bolt-ons you switch on, and whether you cleared the 2,400-minute free allowance. Two archetypes show how the per-hour rates compound.
Archetype 1 — a voice-agent startup running real-time enhanced STT + TTS. A team running 800 hours/month of real-time enhanced transcription (for a live voice agent), layering Translation and Summaries bolt-ons, and synthesising ~5M characters of TTS replies:
| Line item | Monthly cost |
|---|---|
| Real-time enhanced STT: 800 hrs × $0.56 | $448.00 |
| Less free allowance: 20 hrs real-time (1,200 min) | −$11.20 |
| Translation bolt-on: 800 hrs × $0.65 | $520.00 |
| Summaries bolt-on: 800 hrs × $0.12 | $96.00 |
| Text-to-speech: (5M − 1M free) chars × $0.011/1k | $44.00 |
| Total | $1,096.80 |
The transcription line ($448) is barely 40% of the bill — the Translation bolt-on alone ($520) costs more than the underlying transcription, because each capability is a full per-hour rate stacked on top. This is the speech-AI version of the metered-add-on trap we cover in the hidden costs of usage-based pricing: the advertised base rate is a fraction of the realised bill once required capabilities are switched on.
Archetype 2 — a media-captioning team running batch enhanced. A post-production team transcribing 2,000 hours/month of pre-recorded media at enhanced accuracy, with Chapters and the free allowance applied:
| Line item | Monthly cost |
|---|---|
| Batch enhanced STT: 2,000 hrs × $0.40 | $800.00 |
| Less free allowance: 20 hrs batch (1,200 min) | −$8.00 |
| Volume discount: 20% off the 1,500 hrs over 500/mo | −$120.00 |
| Chapters bolt-on: 2,000 hrs × $0.40 | $800.00 |
| Total | $1,472.00 |
The automatic 20% volume discount only applies to the hours above 500/month and only to the speech-to-text line (not the Chapters bolt-on), so a high-volume captioning workload still pays full freight on every add-on. Buyers modelling cost need to count each enabled capability as its own meter — see our primer on choosing the right value metric for why bundling vs unbundling these matters.
Want to estimate your own Speechmatics bill? Use the Speechmatics pricing calculator to model your monthly cost based on hours of audio and characters of speech.
Pricing evolution : From per-hour transcription toward a full speech-AI platform
Speechmatics has cut its speech-to-text prices in two distinct waves while steadily turning a single batch-transcription SKU into a multi-product speech-AI platform. The direction of travel is unusual: where most AI vendors raised prices as models improved, Speechmatics pushed per-hour rates down by roughly 4–5× over three years as GPU-scaled Ursa models lowered its own inference cost.
Cadence
| Quarter | Price changes | Product / SKU additions | Notes |
|---|---|---|---|
| 2022 Q4 | 0 | 0 | Baseline snapshot: Free (4 hrs/mo) / On Demand / Enterprise; batch-only at $1.25/hr Standard, $1.90/hr Enhanced; 48 languages. |
| 2023 Q1 | 1 | 1 | 2023-03: Real-Time Transcription added as a priced SKU ($1.65/hr Standard, $2.15/hr Enhanced), alongside the Ursa model launch and Translation. |
| 2023 Q2–Q3 | 1 | 2 | Major cut by 2023-07: Batch Standard $1.25→$0.80, RT Enhanced $2.15→$1.35; “Lite Mode” batch added at $0.30/hr; speech bolt-ons split out; free tier doubled to 8 hrs/mo. |
| 2024 Q1 | 0 | 0 | 2024-03: prices stable; language coverage up to 50; 10 concurrent real-time sessions on PAYG. |
| 2026 Q2 | 1 | 1 | 2026-06-04: tiers renamed Free / Pro; second deep cut (Batch Std $0.80→$0.24, RT Enh $1.35→$0.56); Lite Mode retired; Text-to-Speech launched at $0.011/1k chars; free allowance expanded to 2,400 min/mo. |
Tracked range: 2022-10 – 2026-06. Quarters not listed (2023 Q4, 2024 Q2–Q4, 2025) showed no priced changes in the sampled Wayback snapshots.
Notable changes
- 2022-10 — Earliest archived structure: Free / On Demand / Enterprise, batch-only at $1.25–$1.90/hr, 48 languages (pricing page, Wayback 2022-10-13).
- 2023-03 — Real-time transcription priced separately ($1.65/$2.15 per hr); coincides with the Ursa launch, which Speechmatics claimed beat OpenAI Whisper by ~25% and Microsoft by ~22% on accuracy.
- 2023 mid-year — First deep price cut and unbundling: standard batch dropped to $0.80/hr, a $0.30/hr “Lite Mode” appeared, and Translation/Summaries/Sentiment/Topics/Chapters became per-hour bolt-ons (verified between the 2023-03 and 2023-07 Wayback snapshots).
- 2024 — Flow voice-agent API launched; pricing page otherwise stable through the 2024-03 snapshot.
- 2026-06 — Second deep cut to $0.24/$0.40/$0.56 STT, retirement of Lite Mode, launch of per-character Text-to-Speech, and a 5× expansion of the free allowance to 2,400 minutes/month.
The two-wave price decline in detail
Across the tracked range, real-time enhanced transcription fell from $2.15/hr (2023-03) → $1.35/hr (2023-07) → $0.56/hr (2026-06) — a ~3.8× reduction — and batch standard fell from $1.25/hr → $0.80/hr → $0.24/hr, roughly 5.2×. Both cuts tracked Speechmatics’ own published engineering work on moving inference to GPUs and shrinking cost-per-hour (its GPU-optimization writeup drew an 87-point Hacker News thread on 2025-05-21). Rather than capture that margin, Speechmatics passed most of it to customers as lower per-hour rates and a far larger free allowance — a deliberate land-grab against Deepgram and AssemblyAI, both of which now sit in the same $0.15–$0.45/hr band.
What’s unique : Per-hour accuracy SKUs and a data-for-discount trade
Per-hour-of-audio metering with distinct accuracy SKUs. Speechmatics splits “standard” and “enhanced” accuracy into separate per-hour line items, letting buyers trade cost against accuracy per job rather than locking into one model. This makes model quality itself a priced dimension — a $0.24/hr vs $0.40/hr batch choice — which is rare among AI APIs that usually meter only volume. It maps cleanly to how transcription buyers already budget (hours of audio, not tokens), a value-metric fit we explore in why per-token pricing confuses buyers.
A free allowance that survives into the paid tier — and quintupled over time. Both Free and Pro accounts get the same 2,400 STT minutes + 1M TTS characters every month, so paying customers keep the freebie rather than losing it the moment a card is added. That allowance grew from 4 hrs/mo (2022) → 8 hrs/mo (2023) → 40 hrs/mo (2026), tracking the same generosity curve as the price cuts.
A data-for-discount trade instead of a cash discount. Enabling “Model Training” (letting Speechmatics use your anonymised audio to improve its models) applies a 33% usage discount that can be toggled off at any time. This is a non-cash lever — the buyer pays in data rather than dollars — and it directly funds the cost-curve improvements that let Speechmatics keep cutting prices. It’s one of the cleaner examples of pricing as a two-sided value exchange in the corpus.
Counter-cyclical price cuts as a competitive weapon. While most AI vendors raised prices as capability improved, Speechmatics drove per-hour rates down ~4–5× from 2022 to 2026, passing GPU-efficiency gains to customers to undercut Deepgram and AssemblyAI on a published, public rate card. Transparent public per-hour pricing in a market where AWS, Google, and many enterprise ASR vendors quote opaquely is itself a differentiator.
Automatic, no-negotiation volume discounting. Pro usage above 500 hours/month for a given speech-to-text type is discounted 20% with zero action required, and the example bill splits base-rate and discounted hours within the same month. This gives self-serve customers an enterprise-style tapered curve without a sales call — a usage-tier mechanic most APIs reserve for negotiated contracts.
Strengths & weaknesses
| Strengths | Weaknesses |
|---|---|
| Transparent public per-hour rates for every STT SKU and bolt-on | Enterprise pricing fully opaque (“Custom” with no indicative band) |
| Generous free allowance (40 hrs/mo) carried into the paid Pro tier | TTS limited to English at launch (more languages “coming”) |
| Accuracy priced as its own axis (standard vs enhanced) | Bolt-ons stack as full per-hour rates — a real bill is far above base |
| Counter-cyclical price cuts (~4–5×) keep it competitive vs Deepgram | Inconsistent free-allowance wording (2,400 min / 1,200+1,200 / “8 hrs”) |
| Automatic 20% volume discount with no sales call | Pro hard-capped at 6,000 hrs/mo, forcing a sales handoff at scale |
| Data-for-discount (33% Model Training) lever beyond cash discounts | Per-hour STT vs per-character TTS units don’t compare cleanly |
Billing UX : Self-serve usage controls, unit toggles and automatic volume discounts
- Hours ↔ Minutes unit toggle — the pricing page lets you re-express every speech-to-text rate per hour or per minute without changing the underlying price.
- Model Training toggle (“Enable for 33% discount”) — opting into anonymized model training applies a 33% discount to usage; it can be turned off at any time for future usage.
- Automatic volume discounting — Pro usage above 500 hours/month for a given speech-to-text type is discounted 20% with no action required; example billing splits base-rate and discounted hours within the same month.
- Per-second metering, monthly invoicing — Pro customers are billed on the 1st of each month for the prior month’s usage, costed to the second at the per-hour rate.
- Free-then-card upgrade path — accounts use the free allowance with no card on file; reaching the limit simply prompts adding a credit card in account settings to continue.
- Pro usage cap — Pro tier usage is capped at 6,000 hours/month; beyond that customers are directed to sales for Enterprise terms.
Strategic wins : Decisions that strengthen the model
1. Transparent per-hour rates lower the barrier to evaluation
Publishing exact per-hour speech-to-text rates — including every bolt-on — lets developers self-qualify before talking to sales, which suits a developer-led buying motion in a market (AWS Transcribe, Google STT, many enterprise ASR vendors) full of opaque, quote-only pricing. Transparency is itself the wedge: a buyer can read the rate card, run the math, and start a free trial in one session. This is the product-led growth motion the rest of the AI-infra market is converging on.
2. Pricing the cost curve down instead of up
Most AI vendors treat capability gains as pricing power and raise rates; Speechmatics did the opposite, cutting STT ~4–5× from 2022 to 2026 as GPU-scaled Ursa models lowered its own inference cost. Passing efficiency to customers — rather than banking it as margin — turned a premium ASR vendor into a price-competitive one against Deepgram and AssemblyAI without abandoning its accuracy story. It’s a textbook case of letting unit economics drive the price floor rather than the ceiling.
3. A generous free allowance that survives into the paid tier
Carrying the same 2,400 free STT minutes and 1M TTS characters into Pro removes the usual “free runs out the moment you pay” friction, and the allowance has grown 10× since 2022. A paying customer never feels punished for upgrading, which lowers the psychological cost of the first paid invoice — the conversion-friction problem we unpack in designing free tiers that convert.
4. A Startup Program that seeds the top of the funnel
Up to $50,000 in usage credits for early-stage founders (typically <$10M raised) converts pre-revenue teams into Speechmatics-native architectures before they can afford Enterprise. Capping cohorts at ~20 startups keeps the credit liability bounded while planting switching costs early — the same land-and-expand logic behind cloud-credit programs.
5. A non-cash discount lever that funds future price cuts
The 33% “Model Training” discount lets price-sensitive buyers pay in anonymised data instead of dollars, and that data feeds the model-improvement flywheel that makes the next price cut affordable. It converts a cost (R&D data acquisition) into a customer-facing incentive — a rare two-sided lever most pricing teams overlook when they default to flat percentage discounts.
Areas to improve : Gaps and proposed fixes
1. Reconcile the free-allowance wording
The page states “2,400 minutes” on the cards, “1,200 + 1,200 minutes” in the comparison table, and “8 hrs free” in legacy FAQ copy carried over from the older 8-hr allowance — proposed fix: state one consistent free figure (40 hrs / 2,400 min) across cards, comparison table, and FAQ so buyers don’t distrust the headline number. Inconsistent allowance language is one of the pricing-page trust killers we flag most often.
2. Surface the true cost of stacked bolt-ons
A buyer reading “$0.24/hr” has no way to know that adding Translation ($0.65/hr) more than triples the bill. Proposed fix: show a worked example — or a live estimate — of base STT + selected bolt-ons, the same way our hidden-cost analysis recommends, so the realised per-hour cost is visible before commitment rather than discovered on the first invoice.
3. Surface Enterprise price anchors
Enterprise is “Custom” everywhere with no indicative band; proposed fix: publish a starting volume-discount example (e.g. “from $0.18/hr at 24,000 hrs/yr”) so buyers can self-qualify before booking a demo, instead of forcing every large buyer into a sales conversation to learn whether the economics even work.
4. Expose the per-character TTS rate in the same unit toggle
The Hours/Minutes toggle covers speech-to-text but text-to-speech stays per-1k-characters; proposed fix: add a per-hour-equivalent estimate for TTS so the two products compare cleanly and a voice-agent buyer can reason about combined STT+TTS cost in one mental unit.
Key takeaways
- Match the value metric to how buyers already budget. Pricing by hour of audio rather than tokens or API calls matches how transcription buyers already think about cost, removing a translation step before purchase. Pick the unit your customer measures their own work in.
- You can pass a cost curve down as a moat. Speechmatics cut prices ~4–5× as its inference got cheaper, using efficiency gains to undercut rivals rather than bank margin — a viable strategy when your unit cost is falling faster than the market’s willingness to pay drops.
- Make capability a priced axis, not just volume. Splitting standard vs enhanced accuracy into separate per-hour SKUs lets the same product serve cost-sensitive and quality-sensitive buyers without a discount negotiation.
- A free tier that survives into paid removes upgrade friction. Carrying the identical monthly allowance into the paid plan means the first invoice never feels like a penalty — a small mechanic that materially lowers conversion anxiety.
- Non-cash discounts can fund your own roadmap. The 33% Model-Training trade buys the data that improves the models that justify the next price cut — discounts don’t have to be pure margin giveaways.
UBP implications
- Accuracy can be a priced dimension. Splitting standard vs enhanced accuracy into separate per-hour SKUs shows model quality itself can be a billable usage axis, not just a feature gate. This widens the addressable market without adding tiers.
- Bolt-on-per-meter packaging maximises revenue but raises bill-shock risk. Charging each capability (translation, summaries, chapters) as its own full per-hour rate captures more revenue per workload, but means the realised bill can be 2–3× the advertised base — vendors must surface stacked costs or risk churn.
- Falling unit costs let usage-based vendors compete on price aggressively. When inference cost drops faster than perceived value, a UBP vendor can cut per-unit rates and expand free allowances to grab share, something a flat-subscription competitor can’t match without restructuring its whole model.
Sources
- Speechmatics pricing page (accessed 2026-06-04)
- Speechmatics Startup Program (accessed 2026-06-04)
- Speechmatics speak-to-sales (Enterprise) (accessed 2026-06-04)
- Speechmatics docs & changelog (accessed 2026-06-04)
- Speechmatics pricing page, Wayback 2022-10-13 (historical) (accessed 2026-06-04)
- Speechmatics pricing page, Wayback 2024-03-05 (historical) (accessed 2026-06-04)
Bottom line
Speechmatics sells speech AI the way its buyers consume it — by the hour of audio and the character of speech — with a free allowance generous enough to evaluate seriously and per-hour SKUs that price accuracy as its own dimension. Enterprise remains a custom black box, but the self-serve Pro tier is unusually transparent for the ASR market.
Want to compare Speechmatics against other usage-based AI pricing? Browse the pricing blueprint.
Pricing timeline : Major events on a vertical axis
Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.
Pro repricing, second deep cut, and TTS launch
Current snapshot: tiers renamed Free / Pro / Enterprise. Speech-to-text cut again — Batch Standard $0.80→$0.24, Real-time Enhanced $1.35→$0.56, Lite Mode retired (standard now $0.24). Free allowance expanded to 2,400 min/mo (40 hrs: 1,200 real-time + 1,200 batch) plus 1M TTS characters. Text-to-speech launched at $0.011/1k characters. Automatic 20% volume discount over 500 hr/mo; 33% discount for enabling Model Training; 56+ languages.
PAYG pricing stable; language coverage grows to 50
Wayback snapshot (2024-03-05): Free / Pay As You Go / Enterprise unchanged on price (Lite $0.30, Batch Std $0.80, Batch Enh $1.04, RT Std $1.04, RT Enh $1.35); language count up to 50, 10 concurrent real-time sessions / 10 batch jobps on PAYG.
Major price cut + Lite Mode + capability bolt-ons
By the 2023-07-29 snapshot the per-hour rates had been cut sharply — Batch Standard $1.25→$0.80, Real-Time Enhanced $2.15→$1.35 — and 'Lite Mode' batch transcription appeared at $0.30/hr. Speech bolt-ons (Translation $0.65, Summaries/Sentiment $0.12, Topics $0.20, Chapters $0.40 per hr) became separately billed; free tier doubled to 8 hrs/mo (4hr batch + 4hr real-time). Restructure landed between the 2023-03 and 2023-07 snapshots, alongside Ursa GPU-scaled models.
Real-time transcription added as a priced SKU
Wayback snapshot (2023-03-07): On-demand now splits Batch ($1.25/hr Standard, $1.90/hr Enhanced) from Real-Time ($1.65/hr Standard, $2.15/hr Enhanced). Coincides with the March-2023 Ursa model launch and Translation/auto-language-ID release. Still 48 languages, no TTS.
Three-tier on-demand pricing ($1.25–$1.90/hr)
Wayback snapshot (2022-10-13): Free (4 hrs/mo) / On Demand / Enterprise (min 200 hrs/mo). Batch speech-to-text only — no real-time split, no TTS, no bolt-ons. On Demand priced at $1.25/hr Standard, $1.90/hr Enhanced; 48 languages.
- · Speechmatics meters Pro speech-to-text by the second, but quotes prices per hour — billing is rounded to the second based on the per-hour rate.
- · Enabling 'Model Training' (letting Speechmatics use your anonymized data) earns a 33% usage discount — a data-for-credit trade rather than a cash discount.
- · The free tier hands every account 2,400 free minutes per month split across real-time and batch speech-to-text, plus 1 million free text-to-speech characters.
Questions & answers
- Is Speechmatics free to use?
- Yes. The free tier requires no credit card and includes 2,400 speech-to-text minutes per month (1,200 real-time and 1,200 batch) plus 1 million text-to-speech characters (~20 hours).
- How much does Speechmatics speech-to-text cost?
- On the pay-as-you-go Pro tier, batch and real-time standard accuracy are $0.24/hr, batch enhanced accuracy is $0.40/hr, and real-time enhanced accuracy is $0.56/hr. Usage is billed to the second based on the per-hour rate.
- Does Speechmatics offer volume discounts?
- Yes. Pro usage above 500 hours/month for a given speech-to-text type is automatically discounted 20%, with additional discounts available from 24,000 hours/year. Enterprise pricing is custom.
- How is Speechmatics text-to-speech priced?
- Text-to-speech is metered separately at $0.011 per 1,000 characters on the Pro tier, after the free 1 million characters per month.
- What is the Speechmatics Startup Program?
- It grants early-stage founders (typically under $10M raised) up to $50,000 in usage credits across real-time, batch, and text-to-speech APIs, plus onboarding and engineering support.