AI Summary
About
Hyperbolic (Hyperbolic Labs) is a San Francisco-based “open-access AI cloud” founded in 2023 by Jasper Zhang (CEO) and Yuchen Jin (CTO). It sells two distinct products on a pure pay-as-you-go basis: a GPU Marketplace for renting raw GPU capacity by the hour (NVIDIA H100 SXM, H200, B200, RTX 4090, RTX 3070 and others), and a Serverless Inference API that runs open-weight models (Llama, Qwen, DeepSeek and more) billed per million tokens. Both surfaces are publicly priced with no required subscription, seat minimum, or contract — buyers add credit and draw it down by usage.
Hyperbolic’s distinguishing idea is a DePIN-style supply model: rather than build out its own data centers, it aggregates underused GPU capacity from third-party data centers and operators and resells it, which lets it list aggressive per-hour rates and refresh them weekly as supply shifts. The company raised roughly $20M total — a seed round around $7M in July 2024 and a $12M Series A in December 2024 led by Variant and Polychain Capital. It reports 200,000+ engineers on the platform and 25+ open-source models served via API, and it is one of Hugging Face’s third-party inference providers. Customers and references include Hugging Face, Quora, Cornell, UC Berkeley, the LMSYS Chatbot Arena, and Reve AI.
For current pricing, see the GPU marketplace and serverless inference pages. Hyperbolic sits in the AI infrastructure and compute category alongside GPU-cloud rivals and open-model inference hosts.
Pricing summary : GPU-hours and per-million-token usage billing
Hyperbolic is pure usage-based across two surfaces with two different value metrics. There are no subscription tiers, seats, or platform fees — you pay only for the compute you consume, drawing down prepaid credit.
- GPU Marketplace — billed per GPU-hour. On-demand starting rates (June 2026): H100 SXM $1.50, H200 $2.40, B200 $3.50, RTX 4090 $0.30, RTX 3070 $0.16 per GPU-hour. Hyperbolic states marketplace rates are refreshed weekly based on the best available supplier rates, so per-hour figures are dynamic rather than a fixed list price. Reserved clusters for long-running jobs are arranged via sales.
- Serverless Inference — billed per million tokens against open-weight models. Each model carries its own rate, from $0.10/M for small Llama models up to $4.00/M for Llama-3.1-405B.
New users start with $5 in free credits to explore inference; GPU rental requires depositing at least $5. Payment is by credit card/Stripe, wire/ACH, or crypto (USDC).
What makes this different: the two value metrics target two buyers from one account — infrastructure teams that want raw GPU-hours for training and custom serving, and developers that want a per-token serverless API without managing GPUs. The marketplace’s weekly-refreshed, supplier-driven rates behave like a spot/marketplace price, not the fixed on-demand list price of a traditional cloud.
Pricing by product
Hyperbolic has two product surfaces, each with its own value metric. On-demand starting rates, per GPU-hour, as of June 2026 (rates refreshed weekly):
GPU Marketplace (per GPU-hour)
| GPU (on-demand) | Starting price | Best for |
|---|---|---|
| NVIDIA B200 | $3.50 /GPU-hr | Newest-generation Blackwell training |
| NVIDIA H200 | $2.40 /GPU-hr | Large-memory LLM training/serving |
| NVIDIA H100 SXM | $1.50 /GPU-hr | Mainstream LLM training & fine-tuning |
| NVIDIA RTX 4090 | $0.30 /GPU-hr | Cost-sensitive inference / dev |
| NVIDIA RTX 3070 | $0.16 /GPU-hr | Light / budget workloads |
Rates refreshed weekly from the best available supplier rates, so the per-hour price is dynamic. The catalog also lists RTX 3080-class cards, and supports clusters of 1–2,048 GPUs with InfiniBand and high-performance storage. Reserved clusters for guaranteed capacity are sales-quoted.
Serverless Inference (per million tokens)
| Model | Price | Key mechanics |
|---|---|---|
| Llama-3.2-3B / Llama-3.1-8B | $0.10 /1M tokens | Smallest, cheapest open models |
| Qwen2.5-Coder-32B | $0.20 /1M tokens | Code-specialized mid-size model |
| Llama-3.1-70B / Qwen2.5-72B / Hermes-3-70B | $0.40 /1M tokens | 70B-class general models |
| DeepSeek-V2.5 | $2.00 /1M tokens | Larger reasoning/MoE model |
| Llama-3.1-405B | $4.00 /1M tokens | Frontier-scale open model |
Each model has its own per-million-token rate; larger models cost more per token. Image (SDXL, FLUX), VLM, and audio modalities are also served, priced per-generation rather than per-token (the per-image rate is not published on the pricing page). Inference tiers gate request rate — Basic 60 RPM, Pro 600 RPM, Enterprise unlimited — but not the token price.
Sales motions across products: self-serve / PLG for both the marketplace and serverless inference (add credit, consume by usage); sales-led for reserved clusters and dedicated single-tenant hosting.
Hidden costs : What Hyperbolic users actually pay
Hyperbolic’s headline rates are clean pay-as-you-go, but a few items shape the real bill:
| Line item | Cost |
|---|---|
| GPU-hour (e.g. 8x H100 SXM) | $1.50/GPU/hr → ~$12.00/hr for the node |
| Inference tokens | Per-model, $0.10–$4.00 /1M tokens |
| Minimum GPU deposit | Must deposit $5 before renting GPUs |
| Weekly rate drift | Marketplace per-hour price can move week to week |
| Reserved clusters / dedicated hosting | Sales-quoted hourly by GPU type |
Two real-world cost drivers stand out. First, the marketplace price is a moving target: because rates refresh weekly off supplier availability, the per-hour rate you budgeted can shift before your next long job — predictable for short bursts, less so for multi-week training. Second, because supply is aggregated from third parties, capacity and reliability can vary by GPU type and region; teams that need guaranteed uptime are steered to reserved clusters or dedicated single-tenant hosting, which are quoted by sales rather than self-serve. Hyperbolic does offset some risk with “no charge for failed instances” billing — you only pay for GPUs that come online.
Want to estimate your own Hyperbolic bill? Use the Hyperbolic pricing calculator to model your costs based on GPU type, hours, and token volume.
Pricing evolution : Hyperbolic pricing history and changes
Cadence
| Period | Price changes | Product / SKU additions | Notes |
|---|---|---|---|
| 2024 H2 | — | GPU marketplace + serverless inference live | H100 advertised from ~$0.99/hr post-Series A |
| 2025 | Per-model inference rate card stabilized | Image (SDXL, FLUX), VLM, audio modalities added | 70B-class at $0.40/M; 405B at $4.00/M |
| 2026 Q2 | Marketplace starting rates published | Reserved clusters, dedicated hosting, crypto pay | H100 from $1.50/hr; weekly supplier-rate refresh |
Tracked range: 2024–present. Marketplace rates are dynamic (weekly supplier-rate refresh), so point-in-time figures reflect the capture date.
Notable changes
- Late 2024 — After a $12M Series A (Dec 2024, led by Variant and Polychain), both surfaces were live; early marketing advertised H100 rental from roughly $0.99/hr and per-million-token open-model inference.
- 2025 — Inference settled into a per-model rate card (3B–8B Llama at $0.10/M, 70B-class at $0.40/M, DeepSeek-V2.5 at $2.00/M, Llama-3.1-405B at $4.00/M), with image, VLM, and audio modalities added.
- June 2026 — Marketplace published on-demand starting rates (H100 SXM $1.50, H200 $2.40, B200 $3.50, RTX 4090 $0.30, RTX 3070 $0.16), refreshed weekly from supplier rates; crypto (USDC) payment and a new-user $5 free credit are in place.
The direction of travel is a maturing two-sided model: a spot-style GPU marketplace whose per-hour rate floats with supplier supply, layered alongside a stable per-token inference rate card.
What’s unique : Hyperbolic’s distinctive pricing mechanics
1. Two value metrics, one account. Hyperbolic prices GPU-hours on the marketplace and per-million-tokens on serverless inference from a single prepaid balance — serving infra teams and API developers without forcing either into the other’s billing model.
2. Weekly-refreshed, supplier-driven marketplace rates. Because it aggregates third-party GPU supply (a DePIN model), Hyperbolic refreshes per-hour rates weekly off the best available supplier prices — a spot/marketplace price rather than a fixed list price, which is unusual for raw-GPU rental.
3. Published rates plus crypto payment. Hyperbolic publishes both per-GPU-hour and per-model token rates openly (no sales call to see numbers) and accepts crypto (USDC) alongside card and wire — rare in the GPU-cloud category and aligned with its open-access positioning.
Strengths & weaknesses
| Strengths | Weaknesses |
|---|---|
| Transparent per-GPU-hour and per-token rates, published openly | Marketplace rate drifts weekly — harder to budget long jobs |
| Aggressive starting prices (H100 from $1.50/hr) | Aggregated third-party supply can vary in capacity/reliability |
| Two value metrics from one prepaid account | Reserved/dedicated capacity is sales-quoted, not self-serve |
| $5 free credit + flexible payment (card, wire, crypto) | $5 minimum deposit required before GPU rental |
| OpenAI-compatible API, zero data retention on inference | Image/audio per-generation rates not published on pricing page |
Billing UX : pay-as-you-go credit, dual metering, public list pricing
- Pay-as-you-go credit, no subscription — both surfaces draw down a prepaid balance by usage; there is no monthly platform fee, seat count, or required commitment to start. New users get a $5 credit; GPU rental needs a $5 minimum deposit.
- Two metered dimensions — GPU-hours on the marketplace and tokens (per million) on serverless inference are metered and billed separately, so a single account can run both meters concurrently.
- Public list pricing — per-GPU-hour and per-model token rates are published rather than gated behind a sales call; only reserved clusters and dedicated hosting are quoted.
- Failure-aware billing — Hyperbolic does not charge for failed instances and notifies within a few minutes if an instance fails, so you pay only for GPUs that come online.
- Flexible payment — credit card/Stripe and pay-as-you-go, wire/ACH upfront or monthly, and crypto (USDC).
Strategic wins : Why Hyperbolic’s pricing decisions worked
1. Aggregating idle supply into a transparent rate card
By reselling underused third-party GPU capacity at openly published, weekly-refreshed rates, Hyperbolic undercuts hyperscalers on raw price while keeping pricing visible — a wedge into the price-sensitive AI-research and indie-developer segment. See how AI companies structure pricing.
2. Two value metrics that capture two buyers
Pricing GPU-hours for infra teams and per-million-tokens for API developers from one account lets Hyperbolic monetize both the “I want raw compute” and the “I just want a model endpoint” buyer without making either adopt the other’s mental model. Related: outcome-based pricing trends.
3. A stable token rate card on top of a spot GPU market
Layering a fixed per-model inference rate card over a fluctuating spot GPU marketplace gives developers predictability where they want it (token price) while letting raw compute float with supply. See choosing the right usage metric.
Areas to improve : Gaps in Hyperbolic’s pricing approach
1. Weekly rate drift hurts long-job budgeting
A per-hour rate that refreshes weekly is fine for short bursts but awkward for multi-week training runs. Clearer rate-lock options (beyond sales-quoted reserved clusters) would make the marketplace easier to budget. See bill shock and cost unpredictability.
2. Unpublished image/audio rates
Inference token rates are published, but per-generation image (SDXL, FLUX) and audio rates are not shown on the pricing page, forcing buyers into docs or a console to learn cost. Publishing them would extend the transparency that benefits the token products.
3. Capacity and reliability transparency
Because supply is aggregated from third parties, capacity and reliability can vary by GPU and region. Real-time availability and SLA clarity (without a sales call) would reduce the gap between a self-serve rate card and a self-serve experience.
Key takeaways
- Hyperbolic is pure usage-based across two surfaces — per-GPU-hour on the marketplace and per-million-tokens on serverless inference, both pay-as-you-go with no subscription. For the underlying model, see the introduction to usage-based pricing.
- It aggregates third-party GPU supply (DePIN) and refreshes marketplace rates weekly, so the per-hour price is a spot/marketplace rate, not a fixed list price.
- Both rate cards are published openly — H100 from $1.50/hr and per-model token rates from $0.10/M — a transparency edge over GPU clouds that gate numbers behind sales.
- The real frictions are rate drift and supply variability, not headline fees; reliability-sensitive teams move to sales-quoted reserved or dedicated capacity.
- Two value metrics serve two buyers from one account, with crypto payment and a $5 free credit reinforcing its open-access positioning.
UBP implications
- A spot meter and a fixed meter can coexist. Hyperbolic floats GPU-hour prices weekly while holding token prices steady — pricing each metric the way its supply behaves, a reusable pattern for any business with both volatile and stable cost inputs.
- Two value metrics widen the addressable buyer set. Offering raw GPU-hours and a per-token API from one account lets a vendor monetize both the infrastructure buyer and the application developer without forcing a single billing model.
- Transparency is a wedge even in commodity infra. Publishing per-hour and per-token rates (plus crypto payment) lowers buyer friction and differentiates against rivals that hide pricing behind sales calls.
Sources
- Hyperbolic GPU Marketplace pricing (accessed 2026-06-15)
- Hyperbolic Serverless Inference pricing (accessed 2026-06-15)
- Hyperbolic — account setup & free credits (accessed 2026-06-15)
- Hyperbolic secures $20M total funding with Series A (accessed 2026-06-15)
- Hugging Face — Hyperbolic as inference provider (accessed 2026-06-15)
Bottom line
Hyperbolic is a clean two-sided example of pure usage-based pricing for AI compute: a GPU marketplace billed per GPU-hour (H100 from $1.50/hr, refreshed weekly off aggregated third-party supply) alongside a serverless inference API billed per million tokens ($0.10–$4.00/M across open models). Both rate cards are published openly, payment includes crypto, and new users get a $5 credit — all reinforcing an open-access positioning. The trade-offs are a marketplace price that drifts weekly and supply that varies by GPU and region, which steers reliability-sensitive teams toward sales-quoted reserved or dedicated capacity. Browse the pricing blueprint for more fully-researched company profiles, or compare Hyperbolic against other AI infrastructure and compute companies.
Pricing timeline : Major events on a vertical axis
Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.
Marketplace starting rates published; H100 from $1.50/hr
GPU marketplace shows on-demand starting rates: H100 SXM $1.50, H200 $2.40, B200 $3.50, RTX 4090 $0.30, RTX 3070 $0.16 per GPU-hour, refreshed weekly from supplier rates. Inference rate card unchanged. New-user $5 free credit; crypto (USDC) payment supported.
Per-million-token inference rate card stabilizes
Inference settled into a per-model rate card: $0.10/M for 3B-8B Llama, $0.40/M for 70B-class models, $2.00/M for DeepSeek-V2.5, and $4.00/M for Llama-3.1-405B, with image (SDXL, FLUX) and audio modalities added.
Two usage surfaces live; H100 advertised from ~$0.99/hr
After its $12M Series A (Dec 2024), Hyperbolic offered both a GPU marketplace and a serverless inference API. Early marketing cited H100 rental from roughly $0.99/hr, with open-model inference billed per million tokens.
- · Hyperbolic prices two different value metrics from one account: GPU-hours on its marketplace and per-million-tokens on serverless inference.
- · It runs a DePIN-style model — aggregating underused GPU capacity from data centers and operators — and accepts payment in crypto (USDC) alongside card and wire.
- · Marketplace GPU rates are refreshed weekly based on the best available supplier rates, so the per-hour price is a spot/marketplace rate rather than a fixed list price.
Questions & answers
- What is Hyperbolic's pricing model?
- Pure usage-based, pay-as-you-go. Hyperbolic bills per GPU-hour on its GPU marketplace and per million tokens on its serverless inference API. There are no subscription tiers or seat fees — you add credit and draw it down by consumption. Reserved clusters and dedicated hosting are sales-quoted.
- How much does an H100 cost on Hyperbolic?
- As of June 2026, an NVIDIA H100 SXM on the marketplace starts at $1.50/GPU/hr. H200 starts at $2.40, B200 at $3.50, RTX 4090 at $0.30, and RTX 3070 at $0.16 per GPU-hour. Rates are refreshed weekly based on the best available supplier rates, so the per-hour price is dynamic rather than a fixed list price. Earlier in 2025 Hyperbolic advertised H100 from around $0.99/hr.
- How is Hyperbolic's inference pricing charged?
- Serverless inference is billed per million tokens, with each open-weight model carrying its own rate: $0.10/M for Llama-3.1-8B and Llama-3.2-3B, $0.20/M for Qwen2.5-Coder-32B, $0.40/M for 70B-class models like Llama-3.1-70B and Qwen2.5-72B, $2.00/M for DeepSeek-V2.5, and $4.00/M for Llama-3.1-405B. You pay only for the tokens you consume.
- Does Hyperbolic have a free tier?
- New users get $5 in free credits to explore the inference models. A separate $1 credit cannot be used to rent GPUs — you must deposit at least $5 before launching a GPU instance. There is also a referral program: refer a friend who tops up $5 within 14 days and you get $5 in credit while they get $6.