What is pure usage-based pricing?
Pure usage-based pricing means the customer pays only for what they consume — no base fee, no seat, no platform charge. The bill starts at zero and scales linearly (or with volume discounts) with actual consumption. Every incremental unit of usage costs the same rate as the first, until a volume tier unlocks a lower rate.
Fifty-six of 158 corpus companies (35%) use pure-usage as their primary pricing model, making it the third most common structure after the freemium tag (54%) and hybrid (41%). The defining characteristic: no fixed cost component means no revenue floor for the vendor, and no minimum cost for the buyer.
Pure usage-based pricing is also called: pay-as-you-go (PAYG), consumption-based pricing, metered billing, per-unit pricing.
Who uses it and why
Pure-usage pricing dominates in two segments:
Developer-facing APIs — LLM token APIs (Anthropic, OpenAI, Google, Mistral, Groq, DeepSeek), embedding APIs (Cohere, Voyage AI, Jina AI), audio/voice APIs (Deepgram, Rev.ai, ElevenLabs PAYG), image generation APIs (Fal.ai, Replicate), and search APIs (Tavily, Exa, You.com, Linkup). The developer-as-buyer does not want a minimum; they want to pay only when they ship production traffic.
Infrastructure and compute — GPU clouds (Vast.ai, RunPod, Modal), serverless (Upstash, Turbopuffer), and browser automation (Browserbase, Apify) bill purely on consumption. No GPU time means no charge.
The pattern: pure usage tracks the developer buyer’s procurement habits. 77% of pure-usage vendors sell primarily to individual developers or engineering teams with credit-card-first, no-PO onboarding.
The free tier is almost universal
88% of pure-usage corpus companies offer a free tier — the highest free-tier rate of any pricing model category. The free tier is the onboarding mechanism: a $5-$10 credit or 200K-1M free tokens lets developers integrate and test without a payment commitment. Typical free-tier structures:
- Permanent free allotment per month (Anthropic: free Claude.ai; Groq: free with rate limits)
- One-time signup credits (Fireworks: $1 free; Fal.ai: $10 free; Modal: $30/month free)
- Monthly free quota that refreshes (Tavily: 1,000 free searches/month; Exa: 1,000 free searches/month)
Billing units by segment
| Segment | Primary Unit | Examples |
|---|---|---|
| LLM APIs | tokens (per 1M) | OpenAI, Anthropic, Mistral, Groq |
| Embedding APIs | tokens (per 1M) | Cohere, Voyage, Jina |
| Audio APIs | media-minutes | Deepgram, Rev.ai, Speechmatics |
| Image APIs | per-image / requests | Fal.ai, Replicate, Ideogram API |
| GPU compute | gpu-hours / per-second | Modal, Vast.ai, RunPod |
| Vector/search | per-query / requests | Turbopuffer, Exa, Linkup |
Structural discounts within pure-usage
Pure-usage does not mean one flat rate. The most common discounts:
Volume tiers — a lower per-unit rate at higher monthly consumption. Most inference APIs offer tiered rates starting around $1k-$10k/month.
Batch processing (~50% off) — asynchronous workloads earn roughly half the synchronous rate across Anthropic, OpenAI, Google, Fireworks, and Mistral.
Cached-input discounts (50-80% off) — for LLM APIs that support prompt caching, a discounted rate applies when input tokens match a cached prefix. Available at Anthropic (75% off), OpenAI (50% off), Google (75% off), DeepSeek (74% off), Groq, Fireworks, Together, Baseten.
Prepaid credits — paying in advance at a discounted rate. Deepgram’s Growth plan, Fireworks prepaid tiers, and Jina AI’s Standard/Premium bundles all reward upfront commitment with a lower effective rate.
When pure usage transitions to hybrid
Several corpus companies started as pure-usage and added a base fee (hybrid), or re-added tier structure on top of PAYG. Exa dropped subscription tiers for pure PAYG in 2024 then re-introduced per-endpoint pricing cards in 2026. ElevenLabs shifted toward PAYG in 2025 while maintaining subscription plans. The endpoint for most is hybrid — a small platform or seat fee plus metered usage — not permanent pure PAYG.
What to watch
Token prices in pure-usage APIs continue to fall with each model generation. Budget assumptions should be revisited at least twice a year. DeepSeek’s entry at $0.27/1M for V3 reset expectations for “frontier-class” pricing; OpenAI’s GPT-5 family ($2.50-$5/1M) continues the generational deflation pattern.
Pure-usage APIs also have no default spend cap — a misconfigured agent or loop can generate a large bill before the buyer notices. Look for vendors that offer configurable spend limits, usage alerts, or pre-funded credit wallets with no-overage behavior (Manus, Modal, ElevenLabs PAYG).
| Company | Product | Pricing model | Billing units | Free tier | Verified |
|---|---|---|---|---|---|
| Anthropic | Claude API (token-based) + Claude.ai consumer subscriptions (Free/Pro/Team/Enterprise) | freemiumsubscriptionseat-based+1 | tokensseatsapi-calls | Yes | 2026-05-29 |
| Anyscale | Managed Ray platform for distributed AI training, inference, and batch processing (RayTurbo, Anyscale Compute Units) | pure-usagecommitmenthybrid | gpu-hourscpu-hourscredits | Yes | 2026-05-29 |
| AssemblyAI | Speech-to-Text & Audio AI APIs | pure-usage | api-callstokens | Yes | 2026-05-29 |
| Baseten | ML inference infrastructure — dedicated GPU deployments, Model APIs, and Truss framework | pure-usagehybridcommitment | gpu-hourstokensrequests | Yes | 2026-05-29 |
| Bland AI | AI phone call automation platform — inbound and outbound voice agents at scale | hybridpure-usagesubscription | api-callscreditsmedia-minutes | Yes | 2026-05-29 |
| Bright Data | Web data platform — proxy networks, scraping APIs, a managed scraping browser, SERP and unlocker APIs, ready-made datasets, and eCommerce insights | pure-usagehybridcommitment+1 | bandwidth-gbrequestsrecords+1 | Yes | 2026-06-04 |
| Browserbase | Browser-agent infrastructure: headless browser sessions, web Search/Fetch APIs, agent identity, runtime, and a model gateway behind one API key | freemiumhybridpure-usage | browser-hoursapi-callsrequests+2 | Yes | 2026-06-02 |
| Cartesia | Real-time voice AI platform (Sonic TTS, voice cloning, voice agents) | freemiumsubscriptionhybrid+1 | creditsrequestsapi-calls+1 | Yes | 2026-05-29 |
| Cerebras | Wafer-scale AI inference cloud and WSE hardware systems | pure-usagesubscriptioncommitment | tokensapi-callsgpu-hours | Yes | 2026-05-30 |
| Cohere | Command, Embed, Rerank APIs | pure-usage | tokensapi-callsrequests | Yes | 2026-05-29 |
| Deepgram | Usage-based speech-to-text, text-to-speech, and voice agent APIs | pure-usagefreemium | media-minutestokenscredits+1 | Yes | 2026-05-31 |
| DeepInfra | Serverless inference cloud — per-token LLM/embedding APIs, per-image and per-minute media models, per-hour on-demand GPU containers, and reserved DeepCluster GPU clusters | pure-usagecommitment | tokensgpu-hoursrequests+1 | No | 2026-06-02 |
| DeepSeek | DeepSeek API (V4-Flash + V4-Pro models, 1M context) with token-based pricing and aggressive cache discounts | freemiumpure-usage | tokensapi-calls | Yes | 2026-06-05 |
| ElevenLabs | Voice AI platform across ElevenCreative, ElevenAgents, and ElevenAPI | subscriptionpure-usagehybrid | characterscreditsmedia-minutes+1 | Yes | 2026-05-28 |
| Exa | AI web search API for agents — search, contents, deep research, and monitoring endpoints billed per request | pure-usagefreemium | requestscreditsapi-calls+1 | Yes | 2026-06-01 |
| Fal | Generative-media inference platform — serverless per-output model APIs plus dedicated GPU compute | pure-usage | gpu-hoursrequestsmedia-minutes | No | 2026-06-01 |
| Fireworks AI | Generative AI inference platform — serverless per-token, on-demand GPU, fine-tuning, batch API | pure-usagehybridcommitment | tokensgpu-hoursrequests | Yes | 2026-05-30 |
| Freepik | AI creative suite — image, video, audio generation plus a 200M+ stock library | subscriptionhybridpure-usage+1 | seatscreditsapi-calls | Yes | 2026-06-05 |
| Gemini API & AI Studio | pure-usagefreemium | tokensrequestsapi-calls | Yes | 2026-05-29 | |
| Groq | GroqCloud — LPU-based ultra-low-latency inference API for Llama, GPT-OSS, Qwen, Whisper, and Mixtral | pure-usagehybridcommitment | tokensrequestsapi-calls | Yes | 2026-05-29 |
| Jina AI | Search Foundation API (Embeddings, Reranker, Reader, DeepSearch, Classifier) | pure-usagefreemium | tokensrequestsapi-calls | Yes | 2026-06-03 |
| Lightning AI | Cloud GPU/CPU Studio compute platform for building, training, and serving AI models, billed by the second with a credit pool. | hybridfreemiumpure-usage | gpu-hourscpu-hourscredits+3 | Yes | 2026-06-02 |
| Linkup | Web search API for AI agents — Search, Fetch, and async Research endpoints with grounded, structured results | pure-usagefreemium | requestscreditsapi-calls | Yes | 2026-06-04 |
| Make | Visual, no-code automation (iPaaS) platform connecting 3,000+ apps and AI agents | pure-usagefreemium | creditstokens | Yes | 2026-06-02 |
| Mercor | AI talent marketplace + enterprise data partnerships for frontier AI labs | pure-usage | tasks | No | 2026-06-08 |
| Metronome | Usage-based billing and metering infrastructure platform | pure-usage | eventstransactions | Yes | 2026-06-03 |
| micro1 | Human-data engine, RL environments, and agent evaluation for frontier AI labs | pure-usage | tasks | No | 2026-06-08 |
| Mistral AI | Open and commercial LLM APIs | pure-usagefreemium | tokensseatsapi-calls+2 | Yes | 2026-05-31 |
| Modal | Serverless compute and GPU platform — per-second billing for Python functions, batch jobs, and model serving | pure-usagefreemiumsubscription+1 | gpu-hourscpu-hoursgb-hours+2 | Yes | 2026-05-29 |
| Murf AI | AI voice / text-to-speech platform (Murf Studio app + Murf API) | subscriptionpure-usagefreemium | media-minutesseatscredits | Yes | 2026-06-01 |
| Novita AI | Pay-as-you-go AI cloud: 200+ model inference APIs, on-demand GPUs, and per-second agent sandboxes under one API | pure-usagefreemium | tokensgpu-hourscpu-hours+2 | Yes | 2026-06-02 |
| OpenAI | ChatGPT consumer subscriptions + GPT-5.x API with token-based usage billing | freemiumsubscriptionseat-based+1 | tokensseatsapi-calls+1 | Yes | 2026-05-30 |
| OpenPipe | OpenPipe fine-tuning and hosted inference platform (small specialized models / RL for agents) | pure-usage | tokenscpu-hours | Yes | 2026-06-04 |
| Oxylabs | Web data collection: residential, datacenter, ISP & mobile proxies plus Web Scraper API and Web Unblocker | hybridpure-usagefreemium | bandwidth-gbipsrecords+1 | Yes | 2026-06-04 |
| Parloa | Enterprise AI Agent Management Platform (AMP) for contact-center voice and chat automation | pure-usage | media-minutesresolutions | No | 2026-06-07 |
| Patronus AI | LLM and AI agent evaluation, monitoring, and guardrail platform | freemiumpure-usage | api-callscredits | Yes | 2026-06-04 |
| Perplexity AI | AI-native answer engine with citations and multi-model search | freemiumsubscriptionseat-based+1 | seatstokensrequests+1 | Yes | 2026-05-29 |
| PhotoRoom | AI image-editing app and per-image Image Editing / Remove Background API for e-commerce product visuals | subscriptionpure-usagefreemium | api-callscreditsseats | Yes | 2026-06-05 |
| Replicate | Cloud platform for running, fine-tuning, and deploying AI models via REST API | pure-usagehybridcommitment | gpu-hourstokensrequests | Yes | 2026-05-30 |
| Rev AI | Pay-as-you-go speech-to-text, transcription, and audio-intelligence APIs | pure-usagefreemium | media-minutescreditsapi-calls | Yes | 2026-06-04 |
| RunPod | GPU cloud marketplace — Secure Cloud and Community Cloud Pods, Serverless endpoints, and persistent storage | pure-usagehybridcommitment | gpu-hoursstorage-gb | No | 2026-05-30 |
| Rytr | AI writing assistant for short-form marketing copy and content | freemiumsubscriptionpure-usage | characterscredits | Yes | 2026-06-07 |
| ScraperAPI | Web scraping API that handles proxies, browsers, and CAPTCHAs behind a single endpoint | subscriptionpure-usage | creditsrequestsapi-calls | No | 2026-06-04 |
| SerpApi | Real-time search-results API (Google, Bing, and other engines) | subscriptionpure-usage | api-callsrequests | Yes | 2026-06-04 |
| Speechmatics | Speech-to-text and text-to-speech APIs with per-hour usage pricing | pure-usagefreemium | media-minutescharacters | Yes | 2026-06-04 |
| Tavily | Tavily Search API | pure-usagefreemium | creditsapi-callsrequests | Yes | 2026-06-03 |
| Togai | Usage-based metering and billing infrastructure platform | pure-usage | eventstransactions | Yes | 2026-06-03 |
| Together AI | AI Acceleration Cloud — serverless inference, dedicated endpoints, GPU clusters, Code Sandbox, fine-tuning | pure-usagehybridcommitment | tokensgpu-hourscpu-hours+1 | Yes | 2026-05-29 |
| turbopuffer | Serverless vector and full-text search database on object storage | pure-usagecommitment | storage-gbvectors-indexedgb-hours+1 | No | 2026-06-04 |
| Twelve Labs | Video understanding foundation models (Marengo for search/embeddings, Pegasus for analysis) delivered as a usage-metered API | pure-usagefreemiumcommitment | media-minutestokensrequests | Yes | 2026-06-02 |
| Upstash | Upstash (Redis, Vector, QStash, Search, Workflow) | pure-usagefreemiumhybrid | requestsapi-callsvectors-indexed+3 | Yes | 2026-06-03 |
| Vast.ai | GPU rental marketplace — on-demand, interruptible (spot), and reserved cloud GPUs plus autoscaling serverless inference | pure-usagecommitment | gpu-hoursstorage-gbbandwidth-gb | No | 2026-06-02 |
| Voyage AI | Embedding and reranker models (text, code, multimodal) for retrieval and RAG | pure-usagefreemium | tokensstorage-gb | Yes | 2026-06-04 |
| You.com | Web search, contents, research, and finance-research APIs for AI systems | pure-usagefreemium | api-callsrequestspages-rendered | Yes | 2026-06-01 |
| Zapier | Workflow-automation (iPaaS) platform connecting 9,000+ apps, with separately-metered AI Agents and Chatbots add-ons | pure-usagefreemium | tasks | Yes | 2026-06-02 |
| ZenRows | Universal Scraper API, Scraping Browser, and Residential Proxies | hybridsubscriptionpure-usage | requestsapi-callsbandwidth-gb+2 | Yes | 2026-06-04 |
FAQ
What is pure usage-based pricing?
Pure usage-based pricing means the customer pays only for what they consume — no base fee, no seat, no platform charge. The bill starts at zero and scales with actual consumption. It's common for developer-facing APIs and infrastructure where buyers want zero minimum cost.
How is pure-usage different from hybrid pricing?
Hybrid pricing combines a fixed component (a platform fee, seat charge, or minimum) with a metered usage layer. Pure usage has no fixed component — every dollar on the invoice is tied to a unit of consumption. In practice, many pure-usage vendors add a seat or platform fee over time, drifting toward hybrid.
Do pure-usage APIs have spend caps?
Not by default. A misconfigured loop or agent can generate a large bill before the buyer notices. Look for vendors that offer configurable spend limits, usage alerts, or pre-funded credit wallets with no-overage behavior — Modal, ElevenLabs PAYG, and Manus all offer some form of guardrail.
Are token prices in pure-usage APIs rising or falling?
Falling. Every major frontier-model API has cut per-token prices at least once per model generation. DeepSeek's V3 at $0.27/1M reset expectations; OpenAI's GPT-5 family continues the deflation trend. Pure-usage token cost assumptions should be revisited at least twice a year.
Related pricing models
- Hybrid Pricing ModelA pricing model that combines a fixed recurring fee with variable usage-based charges, both meaningful to the bill.
- Seat Plus Usage PricingA subset of hybrid pricing where a per-user seat fee is combined with usage-based charges that typically dominate the bill at scale.
- Outcome-Based PricingA pricing model where the customer is charged per business outcome — a resolved support ticket, a converted lead, a closed sale — rather than per unit of input.
- Freemium PricingA pricing model that combines a permanently free tier with paid upgrade plans, used to drive product-led growth and self-serve acquisition.
- Subscription PricingA pricing model that charges a flat recurring fee — monthly or annual — with no usage component meaningful to the bill.
- Committed-Use PricingA pricing model where the customer commits to a minimum spend over a period (typically annual) in exchange for a discounted rate.
- Seat-Based PricingA pricing model where the primary billing dimension is the number of named users, regardless of their consumption.