Pure Usage Pricing: Examples & Companies

What is it

Pure Usage Pricing is a pricing model where the customer pays only for what they consume, with no fixed recurring fee beyond a possible minimum. The bill starts at zero and scales linearly — or with volume discounts — as consumption rises. Every incremental unit is billed at a rate, and there is no seat charge, platform fee, or subscription floor doing the revenue heavy-lifting.

It is the no-commitment model, and it dominates the developer-facing inference and infrastructure segments. DeepSeek bills its API from $0.0028 per 1M tokens on a V4-Flash cache hit; Cohere charges $2.50 / $10 per 1M tokens for Command A, $0.12 per 1M for Embed v4, and $2 per 1,000 queries for Rerank; AssemblyAI meters transcription at $0.15/hr for Universal-2 with no minimum, upfront fee, or contract on its pay-as-you-go plan. In each case the developer-as-buyer wants exactly one thing: to pay nothing until they ship production traffic.

The model spans two structurally different clusters. On the API side sit LLM and embedding endpoints, speech and audio APIs, image generators, and web-search APIs. On the infrastructure side sit GPU clouds and serverless-compute platforms — RunPod rents pods by the GPU-hour, Modal meters serverless work by the GPU-second. What unites them is that idle capacity generates no charge: no tokens processed, no GPU-seconds consumed, no bill.

The trade-off is symmetrical and well understood. The vendor gives up a revenue floor — a slow month is a slow invoice — and the buyer gives up cost predictability, since a busy month or a misconfigured agent can multiply the bill. Most of the discipline in this category is about managing that two-sided uncertainty: prepaid credits and committed tiers for the vendor, spend caps and alerts for the buyer.

Four vendors · four units · one $0 base fee

How it works

Pure usage pricing has one moving part: a meter multiplied by a rate. The design decisions are which unit to meter, how steep the volume curve is, and what structural discounts sit on top. The billing unit follows the workload — tokens for text models, minutes for audio, GPU-hours or GPU-seconds for compute, requests for search and scraping.

Segment	Primary unit	Rate examples (corpus)
LLM / embedding APIs	tokens (per 1M)	DeepSeek from $0.0028/1M (cache hit); Cohere Command A $2.50 / $10; Embed v4 $0.12/1M
Audio / speech APIs	media-minutes / hours	AssemblyAI $0.15/hr (Universal-2), $0.21/hr (Universal-3 Pro); Deepgram per-minute PAYG
GPU compute (pods)	GPU-hours	RunPod H100 $2.89/hr, RTX 4090 $0.69/hr
GPU compute (serverless)	GPU-seconds	Modal H100 $0.001097/s, A100 $0.000694/s, T4 $0.000164/s
Web search / data	requests	Exa Search $7/1k requests (≈$0.007/call); Tavily $0.008/credit

The math is deliberately legible. A worked example on Cohere’s Rerank API: reranking 500,000 queries in a month at $2 per 1,000 queries costs 500,000 ÷ 1,000 × $2 = $1,000, with no seat or platform fee added. A worked example on Modal: running an H100 for 40 hours of serverless inference at $0.001097 per GPU-second costs 40 × 3,600 × $0.001097 ≈ $158, billed per second so a job that finishes early stops the meter.

Unit math: Total bill = Σ (units_consumed × unit_rate) − discounts. There is no + base_fee term — that absence is what makes the model “pure.”

Rates are rarely a single flat number. Four structural discounts recur across the corpus. Volume tiers drop the per-unit rate above a monthly threshold. Batch / async processing typically earns roughly half the synchronous rate. Cached-input discounts apply when input tokens match a cached prefix — DeepSeek’s $0.0028/1M cache-hit rate is an order of magnitude below its cache-miss rate. Prepaid credits reward paying in advance: Deepgram’s Growth plan offers up to 20% savings via prepaid annual credits ($4K+/year).

Companies using this

113 companies in the corpus price primarily on pure usage — the largest single pricing-model cohort here, concentrated in developer APIs (DeepSeek, Cohere, Tavily) and GPU / serverless infrastructure (RunPod, Modal, Vast.ai). The table below lists every one, with its billing units, free-tier status, and verification date.

Patterns observed

Three patterns hold consistently across the 113 companies on this page.

A free tier is nearly universal, and it is the acquisition funnel. Pure-usage vendors give developers a way to integrate at zero cost, then let real traffic convert them. The shapes vary: Deepgram grants a $200 credit with no card required; Modal bundles $30 of free credits into its $0 Starter tier; Tavily refreshes 1,000 free search credits every month. Because the meter only becomes valuable at production volume, giving away the low end costs the vendor little.
The billing unit tracks the underlying cost driver, not the customer’s mental model. AssemblyAI bills the hour of audio it processes, RunPod bills the GPU-hour it rents, Cohere bills the token it generates. This is the opposite of outcome-based pricing: the meter is chosen so the vendor’s margin is protected regardless of whether the customer got value from any given call.
Prices deflate, and vendors bake in discount ladders rather than one flat rate. Cache-hit discounts, batch rates, and volume tiers are standard equipment — Cohere alone lists distinct Command, Embed, and Rerank rates, each with its own volume curve. Every serious inference vendor ships some version of this laddering; the flat “one rate for everyone” model is now the exception, not the rule.

A fourth, softer pattern: pure-usage vendors sell almost entirely self-serve. The overwhelming majority onboard with a credit card and no purchase order, which is why the model and the developer buyer are so tightly coupled — the pricing structure is itself a go-to-market choice.

Counterexamples & variants

The cleanest counterexample is the drift toward hybrid. Pure usage is often a starting point, not an endpoint. Exa launched as Metaphor Systems selling flat $100 and $250/month subscription tiers in early 2024, scrapped them within months for pure pay-as-you-go credits — then in the April 2026 endpoint-card redesign re-introduced per-endpoint pricing structure and raised its base Search rate from $5 to $7 per 1,000 requests. The trajectory of many vendors here is pure-usage → small platform or seat fee + metered usage, i.e. hybrid pricing, once they start selling to teams and enterprises.

Modal is the instructive variant: it is tagged pure-usage because its economics are per-second GPU, CPU, and memory, but it layers flat plan fees ($0 Starter, $250 Team) on top. That flat fee is small relative to the metered spend for any real workload, so the model reads as pure usage in practice — but it is technically a hybrid, and the line between the two categories is genuinely fuzzy at the edges. ElevenLabs sits on the other side of the same line: it appears in the pure-usage cohort for its PAYG API surface but is primarily hybrid, pairing subscription plans with usage metering.

The other genuine variant is the sales-gated pure-usage vendor. Metronome, itself a usage-billing infrastructure platform now part of Stripe, prices usage-based but publishes no public rates beyond a free Starter tier — the Custom plan is sales-quoted. This inverts the category’s usual transparency: pure usage normally means published per-unit prices a developer can read off a page, but billing-infra vendors and enterprise-tier compute providers frequently hide the number behind a conversation. The model is still consumption-based; the legibility that usually comes with it is not.

Finally, the “minimum” clause in the definition matters. A handful of vendors run pure-usage rates but enforce a monthly minimum spend or a minimum prepaid credit purchase. That minimum is a soft floor — it does not make the model hybrid, but it does mean the bill does not literally start at zero, which is worth checking in procurement.

What this means for buyers vs vendors

For buyers

Pure usage is the friendliest model to start with and the easiest to lose control of. The zero floor means you can pilot at near-zero cost, so favor it for early-stage and spiky workloads where you can’t yet forecast volume. Before you scale, ask three questions: is there a spend cap, alert, or hard-stop credit wallet (Deepgram, Modal, and prepaid-wallet vendors offer these); what discounts unlock at your volume (cache-hit, batch, and volume-tier rates can cut the effective price by 50-90%); and how stable is the rate — pure-usage vendors reprice more often than seat plans, so read the pricing page’s change history. See our guide to usage invoicing and billing cycles for how these bills are actually computed, and model your own numbers in the pricing calculators.

For vendors

Pure usage fits when your buyer is a developer, your cost is genuinely variable, and your unit of consumption is measurable and cheap to meter. It removes friction from acquisition — a free tier plus per-unit rates is the lowest-commitment ask you can make — but it hands you revenue that swings with customer traffic and offers no floor. The standard mitigations are all visible in this corpus: prepaid credits and committed-use tiers to pull revenue forward, and a paid team/enterprise tier that adds a flat platform fee once accounts are large enough to tolerate it, drifting you toward hybrid pricing. Start from our introduction to usage-based pricing before committing to a meter — the unit you pick is nearly impossible to change once customers have built against it.

Company	Product	Pricing model	Billing units	Free tier	Verified
01.AI	Yi open-weight models + Yi API + enterprise vertical solutions	pure-usage freemium	tokens api-calls	Yes	2026-06-11
Agility Robotics	Digit humanoid robot + Agility Arc cloud platform (Robots-as-a-Service)	pure-usage	robot-hours	No	2026-06-14
AI21 Labs	Jamba foundation models, Maestro orchestration & enterprise AI	pure-usage freemium	tokens api-calls	Yes	2026-06-11
Aider	Open-source CLI AI pair programmer	freemium pure-usage	tokens	Yes	2026-06-08
Anthropic	Claude API (token-based) + Claude.ai consumer subscriptions (Free/Pro/Team/Enterprise)	freemium subscription seat-based	tokens seats api-calls	Yes	2026-07-06
Anyscale	Managed Ray platform for distributed AI training, inference, and batch processing (RayTurbo, Anyscale Compute Units)	pure-usage commitment hybrid	gpu-hours cpu-hours credits	Yes	2026-05-29
Apptronik	Apollo general-purpose humanoid robot (RaaS + outright sale)	pure-usage commitment	robot-hours units	No	2026-06-14
AssemblyAI	Speech-to-Text & Audio AI APIs	pure-usage	api-calls tokens	Yes	2026-07-06
Baichuan AI	Baichuan & medical M-series LLM APIs	pure-usage freemium	tokens api-calls	Yes	2026-06-11
Baseten	ML inference infrastructure — dedicated GPU deployments, Model APIs, and Truss framework	pure-usage hybrid commitment	gpu-hours tokens requests	Yes	2026-05-29
BentoML	BentoCloud — managed model-serving & inference platform	pure-usage freemium commitment	gpu-hours cpu-hours	Yes	2026-06-15
Bito	AI code review (per-seat) and AI Architect codebase intelligence (usage-based)	seat-plus-usage pure-usage	seats lines-of-code	No	2026-06-08
Bland AI	AI phone call automation platform — inbound and outbound voice agents at scale	hybrid pure-usage subscription	api-calls credits media-minutes	Yes	2026-05-29
Bright Data	Web data platform — proxy networks, scraping APIs, a managed scraping browser, SERP and unlocker APIs, ready-made datasets, and eCommerce insights	pure-usage hybrid commitment	bandwidth-gb requests records	Yes	2026-07-14
Browserbase	Browser-agent infrastructure: headless browser sessions, web Search/Fetch APIs, agent identity, runtime, and a model gateway behind one API key	freemium hybrid pure-usage	browser-hours api-calls requests	Yes	2026-06-02
Cartesia	Real-time voice AI platform (Sonic TTS, voice cloning, voice agents)	freemium subscription hybrid	credits requests api-calls	Yes	2026-05-29
Cerebras	Wafer-scale AI inference cloud and WSE hardware systems	pure-usage subscription commitment	tokens api-calls gpu-hours	Yes	2026-05-30
Chroma	Open-source vector database + Chroma Cloud	pure-usage freemium	storage-gb bandwidth-gb api-calls	Yes	2026-06-09
Claude Code	Agentic coding tool by Anthropic (terminal CLI, IDE, web)	subscription seat-plus-usage pure-usage	seats tokens	No	2026-06-16
Cohere	Command, Embed, Rerank APIs	pure-usage	tokens api-calls requests	Yes	2026-05-29
CoreWeave	GPU cloud & AI compute infrastructure	pure-usage commitment	gpu-hours cpu-hours storage-gb	No	2026-06-15
Daily	Real-time voice and video WebRTC APIs (Video SDK + Pipecat Cloud)	pure-usage	media-minutes api-calls	Yes	2026-07-14
Databricks (Mosaic AI)	Mosaic AI — enterprise GenAI & ML on the Data Intelligence Platform	pure-usage commitment	units tokens gpu-hours	Yes	2026-06-15
Decagon	AI customer support agent platform	outcome-based pure-usage hybrid	resolutions conversations	No	2026-06-11
Deepgram	Usage-based speech-to-text, text-to-speech, and voice agent APIs	pure-usage freemium	media-minutes tokens credits	Yes	2026-05-31
DeepInfra	Serverless inference cloud — per-token LLM/embedding APIs, per-image and per-minute media models, per-hour on-demand GPU containers, and reserved DeepCluster GPU clusters	pure-usage commitment	tokens gpu-hours requests	No	2026-07-14
DeepL	AI translation, writing, and translation API	subscription pure-usage hybrid	characters seats documents	Yes	2026-06-16
DeepSeek	DeepSeek API (V4-Flash + V4-Pro models, 1M context) with token-based pricing and aggressive cache discounts	freemium pure-usage	tokens api-calls	Yes	2026-06-05
ElevenLabs	Voice AI platform across ElevenCreative, ElevenAgents, and ElevenAPI	subscription pure-usage hybrid	characters credits media-minutes	Yes	2026-06-30
Exa	AI web search API for agents — search, contents, deep research, and monitoring endpoints billed per request	pure-usage freemium	requests credits api-calls	Yes	2026-07-14
Fal	Generative-media inference platform — serverless per-output model APIs plus dedicated GPU compute	pure-usage	gpu-hours requests media-minutes	No	2026-06-01
Fireworks AI	Generative AI inference platform — serverless per-token, on-demand GPU, fine-tuning, batch API	pure-usage hybrid commitment	tokens gpu-hours requests	Yes	2026-05-30
Freepik	AI creative suite — image, video, audio generation plus a 200M+ stock library	subscription hybrid pure-usage	seats credits api-calls	Yes	2026-06-05
Gladia	Speech-to-text & audio intelligence API	pure-usage freemium commitment	media-minutes requests	Yes	2026-06-09
Google	Gemini API & AI Studio	pure-usage freemium	tokens requests api-calls	Yes	2026-07-14
Groq	GroqCloud — LPU-based ultra-low-latency inference API for Llama, GPT-OSS, Qwen, Whisper transcription, and Orpheus text-to-speech	pure-usage hybrid commitment	tokens requests api-calls	Yes	2026-07-14
Hippocratic AI	Safety-focused healthcare LLM — patient-facing AI agents for non-diagnostic clinical tasks	pure-usage	media-minutes interactions	No	2026-06-10
Hugging Face	AI model hub, inference endpoints & compute	hybrid seat-based pure-usage	seats gpu-hours cpu-hours	Yes	2026-06-15
Hyperbolic	GPU cloud marketplace & serverless AI inference	pure-usage commitment	gpu-hours tokens images	Yes	2026-06-15
Hyperline	Hyperline — quote-to-cash billing, CPQ and usage-based monetization platform for SaaS	hybrid subscription pure-usage	invoices events seats	Yes	2026-06-10
Inflection AI	Enterprise foundation models (Inflection 3.0) + Pi assistant	pure-usage subscription	tokens gpu-hours seats	No	2026-06-11
Jina AI	Search Foundation API (Embeddings, Reranker, Reader, DeepSearch, Classifier)	pure-usage freemium	tokens requests api-calls	Yes	2026-06-03
Labelbox	AI training-data platform (data labeling, curation & model evaluation)	pure-usage freemium subscription	units records data-licensing	Yes	2026-06-15
Lambda	GPU cloud & AI compute infrastructure	pure-usage commitment	gpu-hours	No	2026-06-09
LanceDB	AI-native multimodal lakehouse	freemium pure-usage commitment	storage-gb vectors-indexed gpu-hours	Yes	2026-06-09
Lightning AI	Cloud GPU/CPU Studio compute platform for building, training, and serving AI models, billed by the second with a credit pool.	hybrid freemium pure-usage	gpu-hours cpu-hours credits	Yes	2026-06-02
Linkup	Web search API for AI agents — Search, Fetch, and async Research endpoints with grounded, structured results	pure-usage freemium	requests credits api-calls	Yes	2026-07-14
LiveKit	Open-source real-time (WebRTC) communications, LiveKit Cloud & Agents framework	hybrid freemium pure-usage	media-minutes credits bandwidth-gb	Yes	2026-06-30
Make	Visual, no-code automation (iPaaS) platform connecting 3,000+ apps and AI agents	pure-usage freemium	credits tokens	Yes	2026-06-11
Maven AGI	Enterprise AI agent platform for customer support	outcome-based pure-usage commitment	resolutions conversations interactions	No	2026-06-11
Mercor	AI talent marketplace + enterprise data partnerships for frontier AI labs	pure-usage	tasks	No	2026-07-14
Metronome	Usage-based billing and metering infrastructure platform	pure-usage	events transactions	Yes	2026-07-14
micro1	Human-data engine, RL environments, and agent evaluation for frontier AI labs	pure-usage	tasks	No	2026-07-14
Milvus	Vector database (OSS) + Zilliz Cloud (managed)	pure-usage freemium commitment	gpu-hours storage-gb vectors-indexed	Yes	2026-06-09
MiniMax	Foundation models, Hailuo video & per-token API	pure-usage freemium	tokens seats credits	Yes	2026-06-11
Mistral AI	Open and commercial LLM APIs	pure-usage freemium	tokens seats api-calls	Yes	2026-07-06
Modal	Serverless compute and GPU platform — per-second billing for Python functions, batch jobs, and model serving	pure-usage freemium subscription	gpu-hours cpu-hours gb-hours	Yes	2026-07-14
Moonshot AI	Kimi assistant + Kimi/Moonshot open-weight LLM API	pure-usage freemium	tokens seats api-calls	Yes	2026-06-11
MultiOn	Autonomous web-browsing AI agent API (wound down)	pure-usage commitment	requests	No	2026-06-10
Murf AI	AI voice / text-to-speech platform (Murf Studio app + Murf API)	subscription pure-usage freemium	media-minutes seats credits	Yes	2026-06-01
Nebius	AI cloud & GPU compute infrastructure	pure-usage commitment	gpu-hours cpu-hours storage-gb	No	2026-06-15
Netlify	Web development & deployment platform (Agent Runners / AI)	freemium hybrid pure-usage	credits builds gb-hours	Yes	2026-07-14
Novita AI	Pay-as-you-go AI cloud: 200+ model inference APIs, on-demand GPUs, and per-second agent sandboxes under one API	pure-usage freemium	tokens gpu-hours cpu-hours	Yes	2026-07-06
OctoAI	Generative AI inference platform (acquired by NVIDIA, sunset Oct 2024)	pure-usage	tokens images generations	No	2026-06-15
OpenAI	ChatGPT consumer subscriptions + GPT-5.x API with token-based usage billing	freemium subscription seat-based	tokens seats api-calls	Yes	2026-06-30
OpenPipe	OpenPipe fine-tuning and hosted inference platform (small specialized models / RL for agents)	pure-usage	tokens cpu-hours	Yes	2026-06-04
OpenRouter	Multi-model LLM API routing marketplace	pure-usage freemium	tokens credits requests	Yes	2026-07-14
Oxylabs	Web data collection: residential, datacenter, ISP & mobile proxies plus Web Scraper API and Web Unblocker	hybrid pure-usage freemium	bandwidth-gb ips records	Yes	2026-07-06
Parloa	Enterprise AI Agent Management Platform (AMP) for contact-center voice and chat automation	pure-usage	media-minutes resolutions	No	2026-06-07
Patronus AI	LLM and AI agent evaluation, monitoring, and guardrail platform	freemium pure-usage	api-calls credits	Yes	2026-06-04
Perplexity AI	AI-native answer engine with citations and multi-model search	freemium subscription seat-based	seats tokens requests	Yes	2026-05-29
PhotoRoom	AI image-editing app and per-image Image Editing / Remove Background API for e-commerce product visuals	subscription pure-usage freemium	api-calls credits seats	Yes	2026-06-05
Pinecone	Managed vector database (serverless)	pure-usage hybrid	requests storage-gb vectors-indexed	Yes	2026-06-09
PlayHT	Text-to-speech & voice cloning API (PlayAI)	subscription freemium pure-usage	characters words api-calls	Yes	2026-06-09
Poe	Multi-model AI chat subscription (by Quora)	subscription hybrid pure-usage	credits seats messages	Yes	2026-06-16
Predibase	Fine-tuning & serving platform for open-source LLMs	pure-usage freemium	tokens gpu-hours	Yes	2026-06-15
Qdrant	Open-source vector database + Qdrant Cloud	pure-usage freemium	cpu-hours gb-hours storage-gb	Yes	2026-06-09
Qodo	Qodo (formerly Codium AI) — AI code integrity platform: Qodo Gen (IDE plugin), Qodo Merge (PR review agent), and Qodo Command (CLI / agentic quality workflows)	pure-usage hybrid	credits requests	No	2026-06-30
Reka AI	Natively multimodal models (Spark, Edge, Flash, Core) + Research & Vision APIs	pure-usage freemium	tokens api-calls requests	Yes	2026-06-11
Replicate	Cloud platform for running, fine-tuning, and deploying AI models via REST API	pure-usage hybrid commitment	gpu-hours tokens requests	Yes	2026-05-30
Resemble AI	AI deepfake detection & watermarking + voice generation APIs	pure-usage	credits media-minutes seats	No	2026-07-14
Retell AI	Conversational voice-agent API platform	pure-usage hybrid	media-minutes messages seats	No	2026-07-14
Rev AI	Pay-as-you-go speech-to-text, transcription, and audio-intelligence APIs	pure-usage freemium	media-minutes credits api-calls	Yes	2026-06-04
Rewind.ai (the original Rewind AI rebranded to Limitless, acquired by Meta)	AI tools aggregator (token-balance) — on the domain once home to the Rewind personal-memory app	freemium pure-usage subscription	tokens credits seats	Yes	2026-06-15
RunPod	GPU cloud marketplace — Secure Cloud and Community Cloud Pods, Serverless endpoints, and persistent storage	pure-usage hybrid commitment	gpu-hours storage-gb	No	2026-07-14
Rytr	AI writing assistant for short-form marketing copy and content	freemium subscription pure-usage	characters credits	Yes	2026-06-07
SambaNova	SambaNova Cloud inference API & RDU AI systems	pure-usage subscription commitment	tokens	Yes	2026-06-15
Sarvam AI	Sovereign Indic LLM, speech & translation APIs	pure-usage freemium	tokens characters media-minutes	Yes	2026-06-11
Scale AI	Data engine, GenAI platform & contributor marketplace	pure-usage commitment	tasks records data-licensing	No	2026-06-15
ScraperAPI	Web scraping API that handles proxies, browsers, and CAPTCHAs behind a single endpoint	subscription pure-usage	credits requests api-calls	No	2026-06-04
SerpApi	Real-time search-results API (Google, Bing, and other engines)	subscription pure-usage	api-calls requests	Yes	2026-06-04
Sierra	Conversational AI customer agents	outcome-based pure-usage hybrid	resolutions conversations	No	2026-06-11
Snowflake Cortex	AI functions and model APIs on Snowflake	pure-usage commitment	credits tokens pages-rendered	Yes	2026-07-06
Speechmatics	Speech-to-text and text-to-speech APIs with per-hour usage pricing	pure-usage freemium	media-minutes characters	Yes	2026-07-06
Stripe Billing	Stripe Billing — recurring, usage-based, and metered billing on the Stripe platform	pure-usage hybrid	transactions invoices events	No	2026-06-10
Tavily	Tavily Search API	pure-usage freemium	credits api-calls requests	Yes	2026-06-03
Togai	Usage-based metering and billing infrastructure platform	pure-usage	events transactions	Yes	2026-06-03
Together AI	AI Acceleration Cloud — serverless inference, dedicated endpoints, GPU clusters, Code Sandbox, fine-tuning	pure-usage hybrid commitment	tokens gpu-hours cpu-hours	Yes	2026-07-14
Trigger.dev	Background jobs and workflow orchestration for developers	hybrid freemium pure-usage	workflow-executions cpu-hours seats	Yes	2026-06-16
turbopuffer	Serverless vector and full-text search database on object storage	pure-usage commitment	storage-gb vectors-indexed gb-hours	No	2026-07-14
Twelve Labs	Video understanding foundation models (Marengo for search/embeddings, Pegasus for analysis) delivered as a usage-metered API	pure-usage freemium commitment	media-minutes tokens requests	Yes	2026-06-02
Unstructured	Document ingestion / ETL API	pure-usage freemium	pages-rendered documents	Yes	2026-07-14
Upstash	Upstash (Redis, Vector, QStash, Search, Workflow)	pure-usage freemium hybrid	requests api-calls vectors-indexed	Yes	2026-07-14
Usage AI	Cloud commitment management & savings optimization (AWS / Azure / GCP)	outcome-based pure-usage	outcomes	Yes	2026-06-16
Vapi	Voice AI infrastructure for developers	pure-usage hybrid	media-minutes messages seats	No	2026-06-09
Vast.ai	GPU rental marketplace — on-demand, interruptible (spot), and reserved cloud GPUs plus autoscaling serverless inference	pure-usage commitment	gpu-hours storage-gb bandwidth-gb	No	2026-07-14
Voyage AI	Embedding and reranker models (text, code, multimodal) for retrieval and RAG	pure-usage freemium	tokens storage-gb	Yes	2026-06-04
Waymo	Waymo One autonomous robotaxi service	pure-usage hybrid	rides	No	2026-06-14
Weaviate	AI-native vector database (open-source core + Weaviate Cloud managed serverless, dedicated/Enterprise Cloud, BYOC)	pure-usage hybrid commitment	vectors-indexed tokens api-calls	Yes	2026-07-06
xAI	Grok API and agentic AI stack	pure-usage freemium	tokens api-calls seats	Yes	2026-07-14
You.com	Web search, contents, research, and finance-research APIs for AI systems	pure-usage freemium	api-calls requests pages-rendered	Yes	2026-06-01
Zapier	Workflow-automation (iPaaS) platform connecting 9,000+ apps, with separately-metered AI Agents and Chatbots add-ons	pure-usage freemium	tasks	Yes	2026-06-30
ZenRows	Universal Scraper API, Scraping Browser, and Residential Proxies	hybrid subscription pure-usage	requests api-calls bandwidth-gb	Yes	2026-06-04
Zhipu AI	GLM foundation models, per-token API, and GLM Coding Plan	pure-usage freemium subscription	tokens api-calls seats	Yes	2026-06-11

Explore this theme in the knowledge graph

FAQ

What is pure usage pricing?

Pure usage pricing is a model where the customer pays only for what they consume — per token, per request, per minute, or per GPU-hour — with no fixed recurring fee beyond a possible minimum. The bill starts at zero and scales with actual consumption.

How is pure usage different from hybrid pricing?

Hybrid pricing pairs a fixed component — a seat, platform fee, or minimum — with metered usage. In pure usage there is no meaningful fixed component: every line on the invoice maps to a unit of consumption. Many vendors, like Modal and Exa, run a $0 pure-usage tier and only add a flat fee at their team plan.

Which companies use pure usage pricing?

It dominates developer-facing inference and infrastructure. In this corpus 113 companies price this way, including DeepSeek, Cohere, and AssemblyAI on the API side, and RunPod, Modal, and Vast.ai on the GPU-compute side.

Do pure-usage APIs have a spend cap?

Not by default. A runaway loop or agent can generate a large bill before anyone notices. Look for vendors offering configurable spend limits, usage alerts, or prepaid credit wallets that stop at zero rather than overaging.

Are pure-usage token prices rising or falling?

Falling. Every frontier-model generation has cut per-token rates — DeepSeek's V4-Flash reaches $0.0028/1M tokens on a cache hit. Budget assumptions for pure-usage APIs should be revisited at least twice a year.

Related pricing models

Related guides & calculators

Back to companies