Pure Usage Pricing: Examples & Companies

56 companies in the corpus Updated full analysis
Definition

Pure Usage Pricing is a pricing model where the customer pays only for what they consume, with no fixed recurring fee beyond a possible minimum.

Also known as: Pay-As-You-Go PricingPAYG PricingConsumption Pricing

What is pure usage-based pricing?

Pure usage-based pricing means the customer pays only for what they consume — no base fee, no seat, no platform charge. The bill starts at zero and scales linearly (or with volume discounts) with actual consumption. Every incremental unit of usage costs the same rate as the first, until a volume tier unlocks a lower rate.

Fifty-six of 158 corpus companies (35%) use pure-usage as their primary pricing model, making it the third most common structure after the freemium tag (54%) and hybrid (41%). The defining characteristic: no fixed cost component means no revenue floor for the vendor, and no minimum cost for the buyer.

Pure usage-based pricing is also called: pay-as-you-go (PAYG), consumption-based pricing, metered billing, per-unit pricing.

Who uses it and why

Pure-usage pricing dominates in two segments:

Developer-facing APIs — LLM token APIs (Anthropic, OpenAI, Google, Mistral, Groq, DeepSeek), embedding APIs (Cohere, Voyage AI, Jina AI), audio/voice APIs (Deepgram, Rev.ai, ElevenLabs PAYG), image generation APIs (Fal.ai, Replicate), and search APIs (Tavily, Exa, You.com, Linkup). The developer-as-buyer does not want a minimum; they want to pay only when they ship production traffic.

Infrastructure and compute — GPU clouds (Vast.ai, RunPod, Modal), serverless (Upstash, Turbopuffer), and browser automation (Browserbase, Apify) bill purely on consumption. No GPU time means no charge.

The pattern: pure usage tracks the developer buyer’s procurement habits. 77% of pure-usage vendors sell primarily to individual developers or engineering teams with credit-card-first, no-PO onboarding.

The free tier is almost universal

88% of pure-usage corpus companies offer a free tier — the highest free-tier rate of any pricing model category. The free tier is the onboarding mechanism: a $5-$10 credit or 200K-1M free tokens lets developers integrate and test without a payment commitment. Typical free-tier structures:

  • Permanent free allotment per month (Anthropic: free Claude.ai; Groq: free with rate limits)
  • One-time signup credits (Fireworks: $1 free; Fal.ai: $10 free; Modal: $30/month free)
  • Monthly free quota that refreshes (Tavily: 1,000 free searches/month; Exa: 1,000 free searches/month)

Billing units by segment

SegmentPrimary UnitExamples
LLM APIstokens (per 1M)OpenAI, Anthropic, Mistral, Groq
Embedding APIstokens (per 1M)Cohere, Voyage, Jina
Audio APIsmedia-minutesDeepgram, Rev.ai, Speechmatics
Image APIsper-image / requestsFal.ai, Replicate, Ideogram API
GPU computegpu-hours / per-secondModal, Vast.ai, RunPod
Vector/searchper-query / requestsTurbopuffer, Exa, Linkup

Structural discounts within pure-usage

Pure-usage does not mean one flat rate. The most common discounts:

Volume tiers — a lower per-unit rate at higher monthly consumption. Most inference APIs offer tiered rates starting around $1k-$10k/month.

Batch processing (~50% off) — asynchronous workloads earn roughly half the synchronous rate across Anthropic, OpenAI, Google, Fireworks, and Mistral.

Cached-input discounts (50-80% off) — for LLM APIs that support prompt caching, a discounted rate applies when input tokens match a cached prefix. Available at Anthropic (75% off), OpenAI (50% off), Google (75% off), DeepSeek (74% off), Groq, Fireworks, Together, Baseten.

Prepaid credits — paying in advance at a discounted rate. Deepgram’s Growth plan, Fireworks prepaid tiers, and Jina AI’s Standard/Premium bundles all reward upfront commitment with a lower effective rate.

When pure usage transitions to hybrid

Several corpus companies started as pure-usage and added a base fee (hybrid), or re-added tier structure on top of PAYG. Exa dropped subscription tiers for pure PAYG in 2024 then re-introduced per-endpoint pricing cards in 2026. ElevenLabs shifted toward PAYG in 2025 while maintaining subscription plans. The endpoint for most is hybrid — a small platform or seat fee plus metered usage — not permanent pure PAYG.

What to watch

Token prices in pure-usage APIs continue to fall with each model generation. Budget assumptions should be revisited at least twice a year. DeepSeek’s entry at $0.27/1M for V3 reset expectations for “frontier-class” pricing; OpenAI’s GPT-5 family ($2.50-$5/1M) continues the generational deflation pattern.

Pure-usage APIs also have no default spend cap — a misconfigured agent or loop can generate a large bill before the buyer notices. Look for vendors that offer configurable spend limits, usage alerts, or pre-funded credit wallets with no-overage behavior (Manus, Modal, ElevenLabs PAYG).

Company Product Pricing modelBilling unitsFree tier Verified
AnthropicClaude API (token-based) + Claude.ai consumer subscriptions (Free/Pro/Team/Enterprise)
freemiumsubscriptionseat-based+1
tokensseatsapi-calls
Yes2026-05-29
AnyscaleManaged Ray platform for distributed AI training, inference, and batch processing (RayTurbo, Anyscale Compute Units)
pure-usagecommitmenthybrid
gpu-hourscpu-hourscredits
Yes2026-05-29
AssemblyAISpeech-to-Text & Audio AI APIs
pure-usage
api-callstokens
Yes2026-05-29
BasetenML inference infrastructure — dedicated GPU deployments, Model APIs, and Truss framework
pure-usagehybridcommitment
gpu-hourstokensrequests
Yes2026-05-29
Bland AIAI phone call automation platform — inbound and outbound voice agents at scale
hybridpure-usagesubscription
api-callscreditsmedia-minutes
Yes2026-05-29
Bright DataWeb data platform — proxy networks, scraping APIs, a managed scraping browser, SERP and unlocker APIs, ready-made datasets, and eCommerce insights
pure-usagehybridcommitment+1
bandwidth-gbrequestsrecords+1
Yes2026-06-04
BrowserbaseBrowser-agent infrastructure: headless browser sessions, web Search/Fetch APIs, agent identity, runtime, and a model gateway behind one API key
freemiumhybridpure-usage
browser-hoursapi-callsrequests+2
Yes2026-06-02
CartesiaReal-time voice AI platform (Sonic TTS, voice cloning, voice agents)
freemiumsubscriptionhybrid+1
creditsrequestsapi-calls+1
Yes2026-05-29
CerebrasWafer-scale AI inference cloud and WSE hardware systems
pure-usagesubscriptioncommitment
tokensapi-callsgpu-hours
Yes2026-05-30
CohereCommand, Embed, Rerank APIs
pure-usage
tokensapi-callsrequests
Yes2026-05-29
DeepgramUsage-based speech-to-text, text-to-speech, and voice agent APIs
pure-usagefreemium
media-minutestokenscredits+1
Yes2026-05-31
DeepInfraServerless inference cloud — per-token LLM/embedding APIs, per-image and per-minute media models, per-hour on-demand GPU containers, and reserved DeepCluster GPU clusters
pure-usagecommitment
tokensgpu-hoursrequests+1
No2026-06-02
DeepSeekDeepSeek API (V4-Flash + V4-Pro models, 1M context) with token-based pricing and aggressive cache discounts
freemiumpure-usage
tokensapi-calls
Yes2026-06-05
ElevenLabsVoice AI platform across ElevenCreative, ElevenAgents, and ElevenAPI
subscriptionpure-usagehybrid
characterscreditsmedia-minutes+1
Yes2026-05-28
ExaAI web search API for agents — search, contents, deep research, and monitoring endpoints billed per request
pure-usagefreemium
requestscreditsapi-calls+1
Yes2026-06-01
FalGenerative-media inference platform — serverless per-output model APIs plus dedicated GPU compute
pure-usage
gpu-hoursrequestsmedia-minutes
No2026-06-01
Fireworks AIGenerative AI inference platform — serverless per-token, on-demand GPU, fine-tuning, batch API
pure-usagehybridcommitment
tokensgpu-hoursrequests
Yes2026-05-30
FreepikAI creative suite — image, video, audio generation plus a 200M+ stock library
subscriptionhybridpure-usage+1
seatscreditsapi-calls
Yes2026-06-05
GoogleGemini API & AI Studio
pure-usagefreemium
tokensrequestsapi-calls
Yes2026-05-29
GroqGroqCloud — LPU-based ultra-low-latency inference API for Llama, GPT-OSS, Qwen, Whisper, and Mixtral
pure-usagehybridcommitment
tokensrequestsapi-calls
Yes2026-05-29
Jina AISearch Foundation API (Embeddings, Reranker, Reader, DeepSearch, Classifier)
pure-usagefreemium
tokensrequestsapi-calls
Yes2026-06-03
Lightning AICloud GPU/CPU Studio compute platform for building, training, and serving AI models, billed by the second with a credit pool.
hybridfreemiumpure-usage
gpu-hourscpu-hourscredits+3
Yes2026-06-02
LinkupWeb search API for AI agents — Search, Fetch, and async Research endpoints with grounded, structured results
pure-usagefreemium
requestscreditsapi-calls
Yes2026-06-04
MakeVisual, no-code automation (iPaaS) platform connecting 3,000+ apps and AI agents
pure-usagefreemium
creditstokens
Yes2026-06-02
MercorAI talent marketplace + enterprise data partnerships for frontier AI labs
pure-usage
tasks
No2026-06-08
MetronomeUsage-based billing and metering infrastructure platform
pure-usage
eventstransactions
Yes2026-06-03
micro1Human-data engine, RL environments, and agent evaluation for frontier AI labs
pure-usage
tasks
No2026-06-08
Mistral AIOpen and commercial LLM APIs
pure-usagefreemium
tokensseatsapi-calls+2
Yes2026-05-31
ModalServerless compute and GPU platform — per-second billing for Python functions, batch jobs, and model serving
pure-usagefreemiumsubscription+1
gpu-hourscpu-hoursgb-hours+2
Yes2026-05-29
Murf AIAI voice / text-to-speech platform (Murf Studio app + Murf API)
subscriptionpure-usagefreemium
media-minutesseatscredits
Yes2026-06-01
Novita AIPay-as-you-go AI cloud: 200+ model inference APIs, on-demand GPUs, and per-second agent sandboxes under one API
pure-usagefreemium
tokensgpu-hourscpu-hours+2
Yes2026-06-02
OpenAIChatGPT consumer subscriptions + GPT-5.x API with token-based usage billing
freemiumsubscriptionseat-based+1
tokensseatsapi-calls+1
Yes2026-05-30
OpenPipeOpenPipe fine-tuning and hosted inference platform (small specialized models / RL for agents)
pure-usage
tokenscpu-hours
Yes2026-06-04
OxylabsWeb data collection: residential, datacenter, ISP & mobile proxies plus Web Scraper API and Web Unblocker
hybridpure-usagefreemium
bandwidth-gbipsrecords+1
Yes2026-06-04
ParloaEnterprise AI Agent Management Platform (AMP) for contact-center voice and chat automation
pure-usage
media-minutesresolutions
No2026-06-07
Patronus AILLM and AI agent evaluation, monitoring, and guardrail platform
freemiumpure-usage
api-callscredits
Yes2026-06-04
Perplexity AIAI-native answer engine with citations and multi-model search
freemiumsubscriptionseat-based+1
seatstokensrequests+1
Yes2026-05-29
PhotoRoomAI image-editing app and per-image Image Editing / Remove Background API for e-commerce product visuals
subscriptionpure-usagefreemium
api-callscreditsseats
Yes2026-06-05
ReplicateCloud platform for running, fine-tuning, and deploying AI models via REST API
pure-usagehybridcommitment
gpu-hourstokensrequests
Yes2026-05-30
Rev AIPay-as-you-go speech-to-text, transcription, and audio-intelligence APIs
pure-usagefreemium
media-minutescreditsapi-calls
Yes2026-06-04
RunPodGPU cloud marketplace — Secure Cloud and Community Cloud Pods, Serverless endpoints, and persistent storage
pure-usagehybridcommitment
gpu-hoursstorage-gb
No2026-05-30
RytrAI writing assistant for short-form marketing copy and content
freemiumsubscriptionpure-usage
characterscredits
Yes2026-06-07
ScraperAPIWeb scraping API that handles proxies, browsers, and CAPTCHAs behind a single endpoint
subscriptionpure-usage
creditsrequestsapi-calls
No2026-06-04
SerpApiReal-time search-results API (Google, Bing, and other engines)
subscriptionpure-usage
api-callsrequests
Yes2026-06-04
SpeechmaticsSpeech-to-text and text-to-speech APIs with per-hour usage pricing
pure-usagefreemium
media-minutescharacters
Yes2026-06-04
TavilyTavily Search API
pure-usagefreemium
creditsapi-callsrequests
Yes2026-06-03
TogaiUsage-based metering and billing infrastructure platform
pure-usage
eventstransactions
Yes2026-06-03
Together AIAI Acceleration Cloud — serverless inference, dedicated endpoints, GPU clusters, Code Sandbox, fine-tuning
pure-usagehybridcommitment
tokensgpu-hourscpu-hours+1
Yes2026-05-29
turbopufferServerless vector and full-text search database on object storage
pure-usagecommitment
storage-gbvectors-indexedgb-hours+1
No2026-06-04
Twelve LabsVideo understanding foundation models (Marengo for search/embeddings, Pegasus for analysis) delivered as a usage-metered API
pure-usagefreemiumcommitment
media-minutestokensrequests
Yes2026-06-02
UpstashUpstash (Redis, Vector, QStash, Search, Workflow)
pure-usagefreemiumhybrid
requestsapi-callsvectors-indexed+3
Yes2026-06-03
Vast.aiGPU rental marketplace — on-demand, interruptible (spot), and reserved cloud GPUs plus autoscaling serverless inference
pure-usagecommitment
gpu-hoursstorage-gbbandwidth-gb
No2026-06-02
Voyage AIEmbedding and reranker models (text, code, multimodal) for retrieval and RAG
pure-usagefreemium
tokensstorage-gb
Yes2026-06-04
You.comWeb search, contents, research, and finance-research APIs for AI systems
pure-usagefreemium
api-callsrequestspages-rendered
Yes2026-06-01
ZapierWorkflow-automation (iPaaS) platform connecting 9,000+ apps, with separately-metered AI Agents and Chatbots add-ons
pure-usagefreemium
tasks
Yes2026-06-02
ZenRowsUniversal Scraper API, Scraping Browser, and Residential Proxies
hybridsubscriptionpure-usage
requestsapi-callsbandwidth-gb+2
Yes2026-06-04

FAQ

What is pure usage-based pricing?

Pure usage-based pricing means the customer pays only for what they consume — no base fee, no seat, no platform charge. The bill starts at zero and scales with actual consumption. It's common for developer-facing APIs and infrastructure where buyers want zero minimum cost.

How is pure-usage different from hybrid pricing?

Hybrid pricing combines a fixed component (a platform fee, seat charge, or minimum) with a metered usage layer. Pure usage has no fixed component — every dollar on the invoice is tied to a unit of consumption. In practice, many pure-usage vendors add a seat or platform fee over time, drifting toward hybrid.

Do pure-usage APIs have spend caps?

Not by default. A misconfigured loop or agent can generate a large bill before the buyer notices. Look for vendors that offer configurable spend limits, usage alerts, or pre-funded credit wallets with no-overage behavior — Modal, ElevenLabs PAYG, and Manus all offer some form of guardrail.

Are token prices in pure-usage APIs rising or falling?

Falling. Every major frontier-model API has cut per-token prices at least once per model generation. DeepSeek's V3 at $0.27/1M reset expectations; OpenAI's GPT-5 family continues the deflation trend. Pure-usage token cost assumptions should be revisited at least twice a year.

Related pricing models

Related guides & calculators

Back to companies