Pricing trivia

Pricing trivia collects the surprising details behind how AI companies actually price — 270 dated facts across 55 researched companies, each linked back to its full Blueprint entry.

Updated

Anthropic 5 facts

  • Anthropic introduced the 'Constitutional AI' training method — a technique where a model critiques and revises its own outputs against a set of written principles — and published it openly in December 2022, before Claude was even publicly available.

  • Claude 3 Opus launched in March 2024 at the same $15/1M input token price as GPT-4 Turbo, but scored higher on key benchmarks — marking the first time a non-OpenAI model had credibly topped the frontier leaderboard on a flagship model launch.

  • Anthropic's prompt caching feature, launched August 2024, charges $3.75/1M write tokens but only $0.30/1M read tokens — a 12.5× read discount that can reduce effective input costs by 80%+ for applications with large, repeated system prompts.

  • The extended context window of 200,000 tokens available on all Claude 3+ models was a deliberate product decision — Anthropic was first to productize very long context at scale, enabling use cases like full codebase analysis that competitors could not match at launch.

  • Amazon has committed $4B to Anthropic and is the primary cloud deployment partner — Claude models are available on Amazon Bedrock and AWS customers can access Anthropic via their existing AWS billing relationship, giving Anthropic enterprise distribution it could not build alone.

Anyscale 5 facts

  • Anyscale's ACU (Anyscale Compute Unit) is denominated 1:1 with USD on the published rate card — meaning a $4.9591/hour A100 line item literally bills $4.9591 in cash per hour, with cloud-list compute already included for hosted customers.

  • Anyscale was founded in 2019 by Robert Nishihara, Philipp Moritz, and Ion Stoica — three of the original UC Berkeley RISELab authors of Ray — making it the rare commercial product where the open-source maintainers, the company founders, and the lead committers are the same people.

  • Anyscale Endpoints (LLM inference at $1/1M tokens) launched August 2023 to compete with Together AI and Fireworks; the product was sunset on January 14, 2025 as Anyscale pivoted to RayTurbo and the broader enterprise platform — one of the highest-profile product sunsets in inference middleware.

  • RayTurbo claims 4.5× faster inference, 50% lower training cost, and 90% faster autoscale relative to open-source Ray — and Anyscale's pitch is that the runtime savings exceed the ACU markup, making hosted-Anyscale net cheaper than self-hosting Ray on raw cloud.

  • Anyscale customer Attentive reports 99% infrastructure cost savings on a specific batch workload after migrating from a hand-tuned in-house orchestrator to Anyscale + RayTurbo — Handshake and Canva report 50% — savings that are now the canonical positioning anchor in Anyscale's enterprise sales motion.

AssemblyAI 5 facts

  • AssemblyAI bills transcription by the hour of audio processed — Universal-2 at $0.15/hr and the more accurate Universal-3 Pro at $0.21/hr — with no minimum commitment, upfront fee, or contract on the pay-as-you-go plan.

  • AssemblyAI's LLM Gateway lets developers call frontier models (OpenAI, Anthropic, Google) directly against a transcript, billed per million input and output tokens — the evolution of what AssemblyAI first shipped as LeMUR, its 'LLM-over-audio' layer.

  • AssemblyAI raised $50M in its Series C in January 2024, bringing total funding to approximately $143M. The round was led by Accel, with participation from Insight Partners, and came just two months after Universal-1 launched as the company's flagship accuracy benchmark.

  • The AssemblyAI Playground — available free in the dashboard — lets anyone test every Speech AI model and LeMUR without entering payment details, making it one of the most frictionless try-before-you-pay developer experiences in the API category.

  • AssemblyAI's Universal-2 model achieves the lowest word error rate of any general-purpose STT model on English audio benchmarks, outperforming Whisper large-v3, Google STT v2, and Deepgram Nova-2 on standard test sets.

Augment Code 6 facts

  • Augment prices every paid plan as a per-developer seat that bundles a fixed monthly credit allotment — Indie 40,000, Standard 130,000, Max 450,000 — but credits are pooled across the whole team, so heavy users effectively borrow from light users' allotments.

  • A single medium-complexity task costs 293 credits on Claude Sonnet 4.6 but 488 on Opus 4.7 and only 88 on Haiku 4.5, so the same prompt can be 5.5x more expensive depending on which model you route it to.

  • Augment runs a routing layer called Prism that picks among a curated model family per request and is designed to cost 20–30% less than frontier-model rates, turning model selection itself into a pricing lever.

  • Cosmos cloud sandboxes are metered entirely outside the seat allotment at 300 credits per hour, prorated in 5-minute increments.

  • Augment changed its pricing metric four times in under 18 months — usage credits (2024), 'unlimited' subscriptions (early 2025), user messages (mid-2025), and a pooled credit pool (October 2025).

  • When Augment dropped 'unlimited,' it disclosed the outlier that broke the model: one user running 335 requests per hour, every hour, for 30 days — approaching $15,000 per month in cost to Augment Code.

Baseten 5 facts

  • Baseten's $0.10833/minute H100 rate works out to ~$6.50/hour — roughly 1.5–2× AWS on-demand H100 list but Baseten markets the spread as the cost of scale-to-zero plus engineer-free ops.

  • Truss, Baseten's open-source model-packaging framework, predates the company's Model APIs by three years — Baseten started as a 'bring your weights, we run the serving stack' product before adding hosted multi-tenant model endpoints in 2024.

  • Baseten's $75M Series C in February 2024 was led by IVP at a $825M post-money valuation; customers cited at the round included Writer, Descript, Patreon, and Robust Intelligence.

  • Baseten's Model APIs price DeepSeek V3.1 at $0.50 input / $1.50 output per 1M tokens — within 5–10% of DeepSeek's own first-party rates, which is unusual for a hosted-inference middleman because most rebrands carry a 30–50% markup.

  • Baseten publishes per-minute pricing rather than per-hour, which makes scale-to-zero economics visible: a model that warm-pools for 4 minutes per request burst on a $0.10833/min H100 costs $0.43 per burst — granularity AWS Bedrock and Vertex AI hide.

Bland AI 5 facts

  • Bland AI went from pre-seed to Series B in under ten months — one of the fastest fundraising sequences in the AI voice category. The company raised $65M total: a $16M Series A in August 2024 led by Scale Venture Partners, and a $40M Series B in February 2025 led by Emergence Capital.

  • In December 2025 Bland rewrote its pricing model from a single flat rate of $0.09/min (regardless of plan) to a tier-linked per-minute rate structure. Free (Start) plan users saw a 55% price increase — from $0.09 to $0.14/min — while higher-tier customers got rate reductions as compensation for their subscription commitment.

  • Bland claims support for up to 1 million simultaneous calls — a scale claim no other voice AI platform makes publicly. This positions the platform for large enterprise telephony replacement rather than boutique AI tooling.

  • HIPAA BAA, SOC 2 Type I and II, GDPR, and PCI DSS compliance are all included in Bland's plans at no extra cost — a deliberate contrast to competitors that charge compliance as an enterprise add-on.

  • Y Combinator (Summer 2023 batch) backed Bland early. Investors include Jeff Lawson (Twilio founder) and Max Levchin (PayPal co-founder) — both with direct telephony and fintech payment experience relevant to Bland's infrastructure bets.

Browserbase 6 facts

  • Browserbase bills the actual compute time agents spend driving a browser — measured in 'browser hours' — not per session, so a 90-second scrape costs a tiny fraction of an hour.

  • Every paid plan's API key doubles as a model gateway: agents reach major LLMs through Stagehand at market token price with unified billing, folding model spend into the same invoice as infrastructure.

  • The free plan ships with $5 of model tokens included, letting developers prototype an agent end-to-end — browser, search, fetch, and model calls — without a second vendor.

  • Browserbase advertises 2,000+ concurrent browsers per instance and 35M+ monthly sessions, positioning the Scale plan around raw concurrency and burst rate rather than seats.

  • Solo founder Paul Klein IV raised $67.5M in 15 months — a $6.5M seed, a $21M Series A (Oct 2024), and a $40M Series B at a $300M valuation (June 2025) — backed by Kleiner Perkins, CRV, and angels including Patrick Collison and Guillermo Rauch.

  • The open-source Stagehand SDK that drives Browserbase's billed browsers launched on Hacker News to a 326-point, 86-comment reception (2025-01-08), seeding the developer base the paid platform monetizes.

Cartesia 5 facts

  • Cartesia was founded in 2024 by Karan Goel and Albert Gu — the same Albert Gu who co-authored the Mamba state-space model paper at CMU. Cartesia's Sonic model is a direct commercial application of state-space architecture, betting that SSMs beat transformers for real-time streaming audio.

  • Sonic was the first commercial TTS model to advertise sub-90ms model latency — roughly 3-5× faster than ElevenLabs Turbo at launch. That latency number is itself a marketing artifact: it measures only the model, not the network round-trip a developer actually pays for.

  • Cartesia raised a $27M seed in March 2024 led by Index Ventures, then a $64M Series A in March 2025 also led by Index — an unusually fast follow-on that locked in pricing power before competitors could undercut. Lightspeed, Conviction, and a roster of AI researchers participated.

  • The Cartesia free tier (20,000 monthly credits, no credit card required) is one of the most generous in voice AI — roughly equivalent to 25-30 minutes of synthesized audio. Compare to ElevenLabs' 10,000-character free tier or PlayHT's 12,500 characters.

  • Cartesia's 'credit' billing unit hides the underlying cost dimensions: seconds of audio, model tier, and feature (the voice changer alone runs 15 credits per second of audio) all affect credit consumption. This is the same opacity tactic Anthropic and OpenAI avoid by publishing per-token rates directly.

Cerebras 5 facts

  • Cerebras's Wafer Scale Engine 3 (WSE-3) contains 4 trillion transistors on a single silicon wafer — roughly 57× more transistors than Nvidia's H100 GPU — making it the largest chip ever manufactured as of 2024.

  • Cerebras filed for an IPO in August 2024 valuing the company at approximately $8 billion, but the IPO was blocked in November 2024 when the Committee on Foreign Investment in the United States (CFIUS) opened a national-security review related to the company's largest customer, G42 of the UAE, which had previously had ties to Huawei.

  • At launch in August 2024, Cerebras Inference ran Llama 3.1 70B at 2,100 tokens per second — more than 20× faster than GPU-based competitors like Together AI or Fireworks AI at the time, a speed record that attracted significant developer attention.

  • Cerebras's inference cloud is powered entirely by its own WSE hardware, not Nvidia GPUs — the first major LLM inference API to achieve competitive scale without a single Nvidia chip.

  • The GPT-OSS-120B model on Cerebras (released May 2025) is an open-source version of OpenAI's reasoning model architecture, distributed under the Apache 2.0 license, and Cerebras claimed it ran faster on their hardware than any other provider.

Character.ai 3 facts

  • Character.ai earns 100% of its revenue from a single $9.99/month consumer subscription — no enterprise tier, no API, no B2B seat pricing. It is one of the few AI companies at $30M+ ARR built entirely on direct-to-consumer freemium, making it a rare pure-B2C case study in the otherwise B2B-dominated AI pricing landscape.

  • The Google reverse acqui-hire of August 2024 ($2.7 billion reported) effectively means Google paid more to license Character.ai's technology and reclaim its two founders than many AI startups raise in their entire existence — yet Character.ai remained independent with ~140 employees and a single-tier subscription.

  • Monthly active users peaked at ~28 million in mid-2024 but fell to ~20 million by early 2025 — an 8-million user loss in under a year — driven primarily by mid-chat ads, charm limits, and model-quality concerns, making Character.ai a cautionary tale about how aggressively restricting a free tier can damage top-of-funnel retention even when the paid product improves.

Clay 6 facts

  • Clay splits usage into two meters most platforms collapse into one: Actions (platform orchestration capacity, always 1 per enrichment) and Data Credits (the marketplace cost of the data itself, 0.5–10+ credits per record).

  • Each paid plan's headline price is really two stacked fees — a fixed Actions tier plus a selectable Data Credits volume — so a single plan name like 'Launch' spans $185/mo to $2,125+/mo depending on the credit slider.

  • Clay charges 0% markup on variable AI pricing for frontier models like GPT-5.1 and Claude 4.6 Opus, withholding an estimate at the 75th percentile of past runs and refunding the unused credits after each run completes.

  • The NY Times reported Clay let employees sell shares at a $5B valuation, and the company publicly crossed $100M ARR — unusually large scale for a data-enrichment tool.

  • In its 2026 restructure — the biggest since 2022 — Clay cut Data Credits 50-90% and called the change deliberately 'revenue- and profit-negative,' a rare public price cut from a company that had just raised at a $3.1B valuation.

  • Clay's pricing has had three eras: flat subscriptions (Basic $199/Explorer $349 in 2022), a single credit meter that topped out at a $800/mo Pro plan (2023-2025), and today's two-meter Actions + Data Credits model.

Codeium 5 facts

  • Codeium originally pivoted to AI coding from a GPU virtualization startup called Exafunction, founded in 2021 — making it one of the clearest pivot stories in the AI tooling category.

  • OpenAI agreed to acquire Windsurf for approximately $3 billion in May 2025 — the second-largest acquisition in OpenAI's history and a direct counter-move to Microsoft's investment in GitHub Copilot.

  • Windsurf introduced 'Flows' as the central AI agentic interaction unit — a proprietary abstraction that bundles multi-step AI reasoning, file edits, terminal commands, and web search into a single billable credit event, distinct from Cursor's per-token pool model.

  • Codeium's free individual extension supports 70+ programming languages and 40+ editors, making it the widest free-tier IDE coverage of any AI coding assistant as of launch.

  • Windsurf's Cascade agent (the multi-step reasoning engine behind Flows) was demoed in November 2024 completing a 10-file refactor with terminal and browser feedback loops — a live demo that went viral and drove 100,000 sign-ups in 72 hours.

Cohere 5 facts

  • Cohere was co-founded by Aidan Gomez, who was a co-author of the seminal 'Attention Is All You Need' paper that introduced the transformer architecture — the technology underpinning virtually every major LLM today.

  • Cohere's North Star is private deployment: unlike OpenAI and Anthropic, Cohere actively champions running models on-premises or in a customer's own VPC, making it the only major frontier AI company to treat cloud-API access as secondary to enterprise ownership.

  • Command R7B (December 2024) is the lowest-cost model in the Command family at $0.0375/1M input and $0.15/1M output — under four cents per million input tokens, making it one of the cheapest production LLM APIs available anywhere.

  • Cohere raised $500M in a July 2024 Series D at a $5B valuation, with investors including Nvidia, Salesforce, Oracle, and Fujitsu — all cloud and enterprise compute partners, not just financial investors.

  • The Rerank API bills by the query ($2 per 1,000 queries), not by token, making it one of the few AI APIs with a unit of billing that maps directly to an application event rather than raw compute consumption.

Cursor (Anysphere) 3 facts

  • Cursor now shows two separate usage pools for individual plans: Auto + Composer and API, with the API pool tied to the selected model's API price.

  • Teams plans add a Cursor Token Rate of $0.25 per million tokens on non-Auto agent requests, while Auto stays exempt.

  • Legacy request-based plans still surface a 20% surcharge on Max Mode, which makes the pricing history unusually legible in the docs.

Deepgram 4 facts

  • Deepgram prices Speech-to-Text and the Voice Agent API per minute of audio, but Text-to-Speech (Aura) per 1,000 characters and Audio Intelligence per 1,000 tokens — three different metering units on one pricing page.

  • New accounts start with a $200 free credit and no credit card, then drop straight onto pay-as-you-go rates with no minimums or expiration.

  • As of May 2026 the pricing page advertises 'limited-time promotional rates on streaming,' showing discounted streaming STT prices struck through against the original rates.

  • The Voice Agent API offers BYO (bring-your-own) LLM and TTS tiers — e.g. Standard - BYO TTS is $0.065/min vs $0.075/min for the fully managed Standard tier.

DeepInfra 6 facts

  • DeepInfra publishes a per-million-token rate for nearly every open model it hosts — from Llama-3.1-8B at $0.02 in / $0.05 out to flagship DeepSeek-V4-Pro at $1.30 in / $2.60 out — making it one of the most transparent open-model inference price lists in the market, with prompt-cache rates shown inline.

  • The same DeepInfra account spans four billing primitives at once: per-token LLM and embedding APIs, per-image Flux generation priced by resolution and step count, per-minute Voxtral audio transcription, and per-GPU-hour on-demand B200/H200 containers — a single bill across four metering units.

  • DeepInfra raised a $107M Series B to scale its inference cloud, and runs a DeepStart program granting qualifying startups 1,000,000,000 free tokens (valued at DeepSeek-V3.1 prices) for companies that have raised $250K–$10M and were founded within the last two years.

  • DeepInfra's DeepCluster product flips the usual cloud model: instead of renting capacity, the customer OWNS the NVIDIA B300 hardware (on their balance sheet, eligible for depreciation) while DeepInfra procures, deploys, and operates it — all-in from $1.98/GPU-hr on a 5-year term vs a $6.50/GPU-hr public-cloud reference.

  • DeepInfra launched in 2023 billing purely by inference execution time ($0.0005/second), only adding per-token LLM pricing in late 2023 — and has since cut its custom-LLM GPU-hour rate roughly 2.5× (A100 $2.00/hr in April 2024 to $0.89/hr by August 2025), a price-cut cadence that made it a go-to "cheap inference" reference in cost-sensitive communities.

  • DeepInfra was built by the team behind the imo messenger app (200M+ users) and, per its 2026 Series B announcement, processes roughly five trillion tokens per week — 25× the token volume it ran at its Series A.

DeepSeek 5 facts

  • DeepSeek-R1's January 2025 release caused Nvidia's stock to drop approximately 17% (~$600B in market cap) in a single day — the largest single-day market cap loss attributable to an AI event in history — because R1 demonstrated frontier AI reasoning at roughly 1/30th the inference cost of OpenAI o1.

  • DeepSeek is funded by High-Flyer Capital Management, a Chinese quantitative hedge fund. DeepSeek reportedly trained V3 on approximately 2,000 Nvidia H800 GPUs at an estimated total cost of $5.5M — a fraction of the 10,000–100,000 GPU clusters used by US frontier labs for comparable models.

  • DeepSeek-V3 and R1 model weights are open-sourced under the MIT license, allowing any developer or company to self-host the models. This makes DeepSeek the only frontier-class model family that is both commercially cheap via API and fully free for self-hosting.

  • DeepSeek's cache-hit input pricing is among the most aggressive in the AI API market: V4-Flash cache-hit input is $0.0028/1M — reduced to one-tenth of its launch price in April 2026 — making reused-context input effectively free relative to US frontier models.

  • DeepSeek-V3 achieved benchmark scores matching or exceeding GPT-4o and Claude 3.5 Sonnet at a training cost orders of magnitude lower, validating the 'mixture-of-experts' architecture as a route to frontier capability at dramatically reduced compute.

Descript 5 facts

  • Descript was founded in 2017 by Andrew Mason, the former CEO of Groupon, after he spun it out of his audio-tour startup Detour.

  • OpenAI's Startup Fund led Descript's $50M Series C at a reported ~$550M valuation in November 2022 — making OpenAI both an investor and, via its models, part of Descript's AI stack.

  • Descript's AI voice-cloning lineage traces to Lyrebird, the synthetic-speech startup it acquired in 2019; the Lyrebird name still appears in Descript's site footer under an ethics statement.

  • On 23 September 2025 Descript replaced transcription-hour plans with media-minutes plus AI-credit pools — and counts every uploaded file against the media pool, so multitrack uploads burn allowance several times faster than a single mixed file.

  • Annual billing on Descript buys two things at once: up to 35% off the seat price AND a larger bundled allowance of media hours and AI credits — a packaging lever most seat-priced SaaS tools never pull.

E2B 6 facts

  • E2B charges per second of running-sandbox compute, not per sandbox or per API call — the meter only runs while a micro-VM is actually executing, so a paused or stopped sandbox costs nothing in compute.

  • Upgrading from Hobby to Pro grants zero additional usage credits. The $150/mo Pro fee buys higher limits (longer runtime, more concurrency, bigger machines), not a credit allowance — a deliberate split between the platform-access fee and the metered compute bill.

  • E2B is operated by FoundryLabs, Inc. and raised a $21M Series A in July 2025 led by Insight Partners (after an $11.5M seed led by Decibel). Its open-source sandbox SDK reports 3M+ monthly downloads and 1B+ started sandboxes, with 94% of Fortune 100 companies cited as users on the enterprise page.

  • E2B's headline price has not moved across the full Wayback record: $150/mo Pro and the per-second vCPU rates ($0.000014/s at 1 vCPU to $0.000112/s at 8) are identical in the 2024-12 and 2026-05 archived pricing pages — every visible change was packaging, not price.

  • The default 2-vCPU sandbox bills at $0.000028/second — about $0.10 per hour of running compute before RAM and storage — making short agent runs nearly free while sustained always-on workloads accumulate quickly.

  • Concurrency is a priced add-on, not just a limit: Pro includes 100 concurrent sandboxes, with Pro+ (600 concurrent) at +$500/mo and Pro++ (1,100 concurrent) at +$1,000/mo stacked on top of the base Pro fee.

ElevenLabs 5 facts

  • ElevenLabs is really three pricing systems in one: a credit ladder, a voice-agent minute meter, and an API meter.

  • The company renamed Conversational AI to ElevenLabs Agents in 2025, so the billing story is tied to product naming as well as pricing.

  • Annual billing exists across the subscription tiers, so the monthly pricing page is only the default view, not the whole offer.

  • Legacy usage-based billing is still documented for older subscriptions, but new subscriptions are pointed to PAYG instead.

  • ElevenAgents says silence longer than 10 seconds gets a 95% discount, which materially lowers dead-air cost.

Exa 5 facts

  • Exa prices its API per 1,000 requests rather than per request, so the headline Search $7 actually means about $0.007 per call.

  • Returning more than 10 results per request triggers a separate per-additional-result charge ($1 per 1k requests on most endpoints), so result count is its own billing dimension.

  • Exa Agent can bill on auto effort (compute scales to the task) or one of four fixed-effort modes from $0.025 to $2.00 per request for predictable pricing.

  • Exa launched as Metaphor Systems and sold flat $100 and $250 per-month subscription tiers (Wanderer and Wanderer+) in early 2024 before scrapping them within months for pure pay-as-you-go credits.

  • Exa's base Search rate sat at $5 per 1,000 requests from January 2025 through early 2026 before rising to $7 in the April 2026 endpoint-card redesign.

Fal 7 facts

  • fal advertises H100 GPUs 'from as low as $1.89/hr' — a per-second-billed rate ($0.0005/s) that undercuts most on-demand hyperscaler H100 list prices.

  • fal has no seats, no monthly plans, and no free tier on its public pricing page — every line item is metered per output or per unit of compute time.

  • fal normalises model-API prices to 'output per $1' on its own pricing page (e.g. 20 seconds of Wan 2.5 video, or 33 Seedream V4 images), turning the price table into a buyer-facing cost comparator.

  • fal's B200 (184GB) GPU tier is listed as 'contact us' for both per-hour and per-second pricing, gating its newest Blackwell capacity behind sales.

  • fal says it powers 50% of Poe's image and video generation and low-latency TTS for PlayAI's voice agents — inference volume sold wholesale to other AI products.

  • fal's pricing page rebuilt itself twice in 18 months: raw per-second unit billing (CPU/GPU/memory) in 2024 became a per-output 'Output per $1' model comparator by mid-2025, tracking its pivot to a generative-media platform.

  • fal raised four rounds in under three years to a $4.5B valuation (Dec 2025, Sequoia-led) — its $140M Series D roughly tripled the $1.5B mark from its July 2025 Series C just five months earlier.

Firecrawl 6 facts

  • Firecrawl prices everything in credits but never charges per seat — a 1-developer team and a 50-developer team on the same plan pay the same as long as their page volume matches.

  • The pricing page geo-detects currency (it rendered in INR with an India flag from some IPs); the USD prices only appear after switching the currency selector.

  • 1 credit = 1 scraped page across Scrape, Crawl, Map, and Monitor — but Search costs 2 credits per 10 results and Interact costs 2 credits per browser minute, so the unit price quietly varies by endpoint.

  • Credits do not roll over month-to-month except for auto-recharge packs (valid 12 months) and upfront-granted annual Scale/Enterprise credits.

  • Firecrawl crossed 500K+ signed-up developers and 100K+ GitHub stars before raising much beyond a $16.2M seed/Series A.

  • Firecrawl's cheapest plan used to be $50/month: the earliest archived pricing page (April 2024, then 'A product by Mendable.ai') had no free tier and sold credit packs at $50, $375, and $1,250 — the entry price only dropped to $0 in the June 2024 rebrand.

Fireworks AI 5 facts

  • Fireworks AI's $7.00/hour H100 (and H200) on-demand price is one of the lowest published rates among managed inference platforms — roughly 30% below Together AI's $5.49–$6.49 H100 dedicated rates only because Together's listed rate excludes Fireworks' full-stack optimization layer.

  • Fireworks was founded in 2022 by Lin Qiao (ex-Meta), Dmytro Ivchenko, and Pawel Garbacki — Lin Qiao led the PyTorch team at Meta when PyTorch 1.0 shipped, giving Fireworks unusual inference-runtime credibility.

  • Fireworks' fine-tuning rate card is one of the most granular in the industry: LoRA SFT at $0.50 per 1M training tokens for <16B models, LoRA DPO at $1.00, full-parameter SFT at $1.00, full-parameter DPO at $2.00 — and it scales linearly through the 16B → 300B+ model size tiers.

  • Cached input tokens get a 50% discount on serverless inference, and batch inference applies the same 50% discount independently — meaning a batched RAG workload with high prefix re-use can land at 25% of the standard input price.

  • The new $1 trial credit (down from a more generous earlier offer) is one of the smallest in the inference middleware market and suggests Fireworks now relies more on sales-led conversion than self-serve experimentation for revenue growth.

GitHub Copilot 6 facts

  • GitHub Copilot replaced its premium-request quotas with GitHub AI Credits in June 2026, where 1 AI credit equals exactly $0.01 USD — making the bill a literal dollar-denominated token pass-through.

  • Code completions and next-edit suggestions are explicitly NOT billed in AI credits and stay unlimited on every paid plan — only chat, agents, code review, CLI, and Spaces draw down credits.

  • Copilot Business pools its 1,900 AI credits per user at the enterprise level: 100 seats become a shared 190,000-credit pool, so power users borrow from light users automatically.

  • Annual Pro/Pro+ subscribers who signed up before June 2026 are grandfathered onto the legacy premium-request model (300/1,500 requests, $0.04 overage) instead of the new credit system.

  • Copilot's individual price sat at exactly $10/month from late 2021 through 2024 — then changed structure four times in twelve months (Free tier, Pro+, premium requests, AI Credits).

  • GitHub's April 2026 move to usage-based billing drew 767 points on Hacker News, and press reported power users projecting 10×–50× monthly cost increases for heavy agentic workflows.

Glean 4 facts

  • Glean's pricing page is the redirect: glean.com/pricing 301s straight to the homepage, so the company has effectively no public pricing surface to archive — Wayback holds no usable pricing snapshots.

  • Glean tripled ARR from ~$100M to ~$300M in roughly 15 months while keeping pricing entirely gated — a counter-example to the 'transparency drives growth' thesis.

  • Glean's own docs disclose credit-consumption ranges (e.g. a Thinking Mode premium query ~35-120 FlexCredits) but never the dollar value of a credit — usage mechanics are public, price is not.

  • Glean shares its name with at least two unrelated products: Meta's open-source code-indexing system (glean.software) and Glean.ai, an AP/finance tool — a recurring entity-disambiguation trap.

Google 5 facts

  • Google Gemini 2.5 Flash-Lite outputs tokens at just $0.40/1M — cheaper per output token than any other frontier-grade model from OpenAI or Anthropic as of mid-2026, enabling cost-effective large-scale deployments.

  • Google's context caching gives a 90% discount on cached input tokens, meaning a developer who sends the same 100K-token system prompt 1,000 times per day saves roughly $11,250/month compared to charging all tokens at standard rate.

  • The Gemini API free tier via AI Studio requires no credit card, making it one of the most accessible no-commitment AI API tiers in the market — ideal for student developers, hobbyists, and prototype builders worldwide.

  • Vertex AI's Priority tier charges 1.8× the standard rate for guaranteed capacity — making Google one of the few AI API providers to explicitly price throughput guarantees rather than bundling them into enterprise contracts.

  • Google introduced regional (non-global) pricing effective July 2026, adding a 10% premium for Gemini 3 models accessed outside global endpoints — the first time Google split Gemini API pricing by geography.

Groq 5 facts

  • Groq's Llama 3.1 8B Instant at 840 tokens/second is one of the fastest published throughput rates for any 8B-class model in commercial inference — the LPU silicon architecture is designed specifically for inference rather than training, which is what makes the speed and pricing combination possible.

  • Groq was founded in 2016 by Jonathan Ross, the original engineer behind Google's TPU (Tensor Processing Unit) — making it the rare commercial AI inference platform built on bespoke silicon designed by the same engineer who pioneered modern ML accelerators.

  • Groq's LPU (Language Processing Unit) deliberately avoids the GPU model: each chip has deterministic execution, no HBM (uses on-die SRAM), and no GPU-style branch prediction — the trade-off is lower per-chip memory but vastly higher single-stream throughput.

  • Built-in tools have explicit per-use pricing: web search at $5–$8 per 1,000 requests, website visits at $1 per 1,000 requests, and code execution at $0.18/hour — making Groq one of the few platforms where agentic tool usage has transparent line-item billing.

  • Whisper transcription pricing differentiates by model variant: Whisper Large v3 at $0.111 per hour transcribed (higher accuracy, slower) versus Whisper Large v3 Turbo at $0.04 per hour transcribed (lower accuracy, much faster) — the 2.8× price spread reflects the engineering trade-off.

Gumloop 5 facts

  • Gumloop's Pro plan has no fixed price — a credit slider sets it anywhere from $37/mo (20k credits) to $1,840/mo (1M credits), then hands off to 'Contact sales' above 1M.

  • Seats are free on every paid Gumloop plan: Pro includes unlimited seats, so the only thing you pay for is credits — a near-total inversion of the per-seat SaaS norm.

  • Bringing your own API key cuts agent AI-model credit costs by 50%, and most native workflow nodes (logic, loops, Google Sheets, Slack) cost 0 credits.

  • Credits don't roll over month-to-month on Pro — only Enterprise plans get rollover — so unused capacity is forfeited each cycle.

  • Gumloop started life as AgentHub (Y Combinator W24) — its 2024-02-08 Launch HN drew 162 points — and for over a year priced fixed $97 Starter and $297 Pro tiers before scrapping them for today's $37 credit slider.

Harvey 4 facts

  • Harvey publishes no pricing page at all — harvey.ai/pricing returns a 404, and the public site is a pure demo-request funnel with enterprise logos where a rate card would be.

  • When Artificial Lawyer estimated Harvey's per-seat economics after the LexisNexis deal, Harvey publicly pushed back, calling the assumptions 'wildly off' while still declining to share actual rates.

  • Harvey crossed $100M ARR in August 2025 — roughly three years after founding — and reported ~$190M+ ARR by early 2026, on its way to an $11B valuation in March 2026.

  • Co-founder Winston Weinberg is a former litigator at O'Melveny & Myers; that practitioner credibility helped Harvey land 45+ AmLaw 100 firms as customers.

HeyGen 5 facts

  • HeyGen (founded 2020 by Joshua Xu and Wayne Liang, originally 'Movio/Surreal') only launched its app in September 2022 yet was named G2's #1 Fastest Growing Product of 2025.

  • HeyGen raised a $60M Series A in June 2024 led by Benchmark at a $500M valuation, after pivoting its cap table away from mainland-China investors.

  • The Pro plan exposes the same feature set across an 88× credit ladder — 1,000 credits at $49/mo up to 100,000 credits at $4,300/mo — so heavy users scale spend, not capabilities.

  • HeyGen runs two separate prepaid balances: web-plan 'Premium Credits' (drawn by MCP/OAuth) and an independent API wallet (drawn by Skills/Direct API) — top up one and the other stays empty.

  • Avatar IV/V videos cost 20 credits/min versus 3 credits/min for Avatar III — a ~6.7× swing that means the engine you pick, not just the runtime, drives your bill.

Ideogram 4 facts

  • Ideogram runs two parallel pricing tracks: a freemium consumer subscription priced in credits, and a developer API priced per output image by model and rendering speed.

  • On the consumer plans, the cheapest model+speed (Upscale 1.0) costs just 0.5 credits per 4 images, while Ideogram 3.0 Quality costs 6 credits per 4 images — a 12x spread inside the same credit pool.

  • API character-reference calls cost 1.7x–3.3x the base rate: 3.0 Quality jumps from $0.09 to $0.20 per image when a character reference image is included.

  • Ideogram raised an $80M Series A led by Andreessen Horowitz in February 2024 (≈$96.5M total) on the strength of typography its rivals couldn't match — reviewers peg its in-image text accuracy near 90–95%.

Intercom 3 facts

  • Intercom sells Fin AI Agent as a standalone product that runs INSIDE its competitors' helpdesks — Zendesk, Salesforce, etc. — at the same $0.99/resolution with a 50-resolution monthly minimum and no seat fee. It's a rare 'compete on your rival's surface' pricing move that removes switching costs entirely while still collecting outcome revenue.

  • Intercom's Early Stage Program offers a 93% first-year discount for startups with fewer than 15 employees and under $10M funding, making the Essential plan effectively about $2/seat/month plus 300 free Fin outcomes per month — one of the most aggressive startup discounts in B2B SaaS and a quiet pricing-tier-by-stealth that's invisible to anyone who doesn't qualify.

  • Fin's $0.99/resolution is one of the cleanest outcome-based meters in AI today — most 'AI product' pricing meters input (tokens, API calls, queries) regardless of whether the user got what they came for. Intercom only bills when Fin closes a conversation without human escalation, which structurally aligns vendor incentives with customer outcomes.

Jasper 5 facts

  • Jasper launched in early 2021 as Conversion.ai, briefly rebranded to Jarvis, then settled on Jasper in 2022 after a trademark conflict with the Iron Man AI character of the same name.

  • Early Jasper sold capacity in words: the 2022 Starter plan capped output at 20,000 words/month and 'Boss Mode' charged for unlimited generation — the platform has since dropped word caps and shifted entirely to per-seat pricing.

  • Jasper raised a $125M Series A in October 2022 at a $1.5B valuation, one of the first generative-AI 'unicorns', just weeks before ChatGPT launched and reset the market it had been built on.

  • Today Jasper publishes only its $59–$69/seat Pro price; even the cost of a second seat is hidden behind 'Contact Sales', so the public floor understates the real cost for any team.

  • Jasper's marketing page calls Business 'the most popular plan' while only the cheaper Pro plan carries a visible price — the recommended tier is the one you cannot self-serve.

Lightning AI 5 facts

  • Lightning AI is built by the team behind PyTorch Lightning, the open-source training framework with 350,000+ builders cited on its pricing page.

  • Every GPU and CPU Studio is billed by the second and drawn down from a credit pool — the headline plan fee mostly buys you a monthly credit allotment, not the compute itself.

  • The Free tier gives 15 monthly credits that map to roughly 80 GPU hours per month on interruptible (spot) machines — but those free credits expire every month if unused, while purchased credits last 12 months.

  • Spot/interruptible GPUs are discounted up to 80% versus on-demand, and a single L40S Studio can cost $2.14/GPU/hr on-demand while an interruptible T4 runs as low as $0.52/GPU/hr.

  • In January 2026, Lightning AI completed a merger with GPU provider Voltage Park, creating a combined company valued at over $2.5 billion with reported ARR above $500 million and a fleet of 36,000+ owned H100/B200/GB300 GPUs.

Midjourney 5 facts

  • Midjourney employs roughly 107 people yet generates an estimated $300–500M in annual revenue — one of the highest revenue-per-employee ratios in AI at ~$3–4M per employee.

  • Midjourney has never raised external venture capital; founder David Holz bootstrapped the company to profitability from day one, rejecting VC funding even at a ~$10B implied valuation.

  • The free trial was killed in April 2023 specifically because deepfake images of Donald Trump being arrested and Pope Francis in a Balenciaga puffer jacket went viral — not because of cost pressures.

  • Midjourney bills in GPU hours rather than image counts: one standard image generation uses roughly 1 minute of GPU time, meaning the $10 Basic plan yields approximately 200 images.

  • Turbo mode runs at 3.5× Fast speed but consumes 2× the GPU hours — making it cost-equivalent to generating twice as many Fast images.

Mistral AI 5 facts

  • Mistral's Pro subscription launched at $14.99/mo in February 2025 — a deliberate $5 undercut of ChatGPT Plus ($20/mo).

  • The same per-million-token meter underpins both the developer API and the consumer Vibe assistant, with Vibe overages billed at API rate via PAYG credits.

  • Mistral ships open-weight models (Mistral Small 4 under Apache 2.0, Mistral Medium 3.5 under a modified MIT license) and charges per token to call them — a hybrid of open weights plus hosted inference.

  • Paris-based Mistral raised a €1.7B Series C in September 2025 led by chip-equipment maker ASML, valuing it around $14B.

  • Mistral renamed Le Chat to Vibe in May 2026, repositioning the chatbot as an autonomous work-and-code agent rather than a chat window.

Modal 5 facts

  • Modal's per-second billing granularity ($0.001097/sec on H100) means a 5-second cold start costs $0.0055 — among the finest billing granularity in any cloud compute product, two orders of magnitude finer than the per-minute industry standard.

  • Modal was founded in 2021 by Erik Bernhardsson (ex-Spotify ML, creator of Luigi and Annoy) — making it one of the few infrastructure platforms where the founder is the primary author of widely-deployed open-source data tooling, lending unusual credibility to the developer-experience pitch.

  • Modal grants up to $10,000 in credits to qualifying startups and academic researchers — one of the most generous credit programs in cloud compute, reflecting the founder's bet on developer-led GTM rather than enterprise sales-led growth.

  • The Team plan ($250/mo + $100 credits) is the platform's only mid-tier subscription — its presence between the free Starter tier and the quote-based Enterprise tier creates a rare 'committed mid-market' SKU that most serverless GPU competitors omit.

  • Modal's storage tier ($0.09/GiB-month with 1 TiB free) is positioned to absorb model weights and dataset storage without forcing customers onto S3 — a vertical integration choice that simplifies the developer experience but reduces multi-cloud flexibility.

Murf AI 5 facts

  • Murf splits its pricing across two motions: the Studio app sells flat-rate subscriptions metered by voice-generation hours, while the Murf API is pure pay-as-you-go at $0.01 per 1000 characters for its Falcon model.

  • Murf Studio meters voice generation by time per year, not per month — Creator includes 24 hrs/year and Business 96 hrs/year of voice generation rather than a monthly minute bucket.

  • Murf API credits never expire once purchased, and every API account gets $10 of free credit refreshed every month; early-stage startups can apply for $1,500 in free credits over 3 months.

  • Across Murf's archived pricing (2021–2024), the Basic ($13→$19) and Pro ($26) Studio tiers barely moved while the Enterprise tier swung from $83 to $166 to a per-seat $59, then $99, then $75 — almost all the price action happened at the enterprise edge.

  • Murf's pricing page is a JavaScript single-page app, so every Wayback snapshot from September 2024 onward archived as a blank skeleton — making the exact Basic/Pro → Creator/Business rename date impossible to pin from the public archive.

Novita AI 5 facts

  • Novita publishes per-second billing for both GPU instances and agent sandboxes — a 5-minute coding-agent task on 1 vCPU + 512 MiB RAM is quoted at roughly $0.0034.

  • The same NVIDIA H100 appears at three different prices depending on product: $2.59/hr on-demand GPU instance, $1.99/GPU-hour as a dedicated endpoint, and $1.70/GPU/hr on an 8-GPU bare-metal node.

  • Novita lists 226 models on its catalog and undercuts first-party APIs — DeepSeek V3.1 runs $0.27 input / $1 output per million tokens versus DeepSeek's own rates.

  • Spot GPU instances are priced at roughly half the on-demand rate (RTX 4090 $0.67 on-demand vs $0.34 spot).

  • Novita started in 2023–2024 as a credit-funded Stable-Diffusion image API billed in USDT/Stripe top-ups ('1/10 the price of DALL-E2 and MJ') with a Singapore HQ — only pivoting into LLM + GPU inference and relisting in San Francisco through 2025.

OpenAI 5 facts

  • GPT-4's launch in March 2023 at $60 per million output tokens made it the most expensive widely-available model in history — within 26 months OpenAI had cut equivalent capability cost by 98% with GPT-4.1 at $8/1M output.

  • ChatGPT reached 1 million users in 5 days after launch in November 2022 — the fastest consumer product adoption ever recorded at that time. It passed 100M users in 60 days.

  • OpenAI's $200/month ChatGPT Pro plan, launched December 2024, gives unlimited access to o1 Pro mode — a configuration that uses significantly more compute per query than the standard o1 model and was not previously available at any price.

  • OpenAI uses a 'soft limit' system for API usage: there is no hard cap by default, but users can set monthly spend limits in the dashboard to prevent runaway costs from agentic loops.

  • The GPT-4o mini model at $0.15/1M input tokens is 97× cheaper than the original GPT-4 launch price ($15/1M input), while scoring competitively with GPT-4 on many benchmarks — the fastest cost-performance improvement in AI model history.

Perplexity AI 5 facts

  • Perplexity's $200/month Max plan matches OpenAI ChatGPT Pro dollar-for-dollar — a deliberate signal that Perplexity considers itself a peer to the market leader, not just a cheaper alternative.

  • In July 2024, Perplexity launched a publisher revenue-share program after being accused of plagiarism by Forbes, Wired, and others — effectively monetizing the citations that define its product identity.

  • Perplexity AI's valuation grew roughly 175× in approximately 30 months: from $121M in April 2023 to $21B by early 2026, fueled almost entirely by subscription growth.

  • The Sonar API launched in January 2025 as a dedicated search-native API, replacing the earlier pplx-api (October 2023) which hosted generic open-source models with no real-time web access.

  • Enterprise Pro's SCIM provisioning only unlocks at 50+ seats or with at least one Enterprise Max user — making it the rarest automated provisioning gate in the AI-tools category.

Recraft 4 facts

  • Recraft's API and Studio use two completely different units: the Studio meters in 'credits' (1-2 per image) while the API meters in 'API units' priced at a flat $1 = 1,000 units.

  • The same image can cost wildly different amounts by model: a Recraft V2 raster render is $0.022 via API while a V4.1 Pro render is $0.25 - an 11x spread inside one product.

  • Recraft's Free plan keeps the copyright: images generated on the $0 tier are owned by Recraft, public in the community gallery, and carry no commercial rights until you pay.

  • Studio credits do not roll over month to month, but separately-purchased top-up credits never expire - two opposite expiry policies inside the same wallet.

Relevance AI 6 facts

  • Relevance AI splits its credit model in two: 'Actions' (a flat charge each time a tool runs) and 'Vendor Credits' (the raw AI-model cost), and it passes Vendor Credits through at wholesale with zero markup.

  • Vendor Credits roll over indefinitely while you stay subscribed — both the bundled allowance and any top-ups — a rare 'use-it-whenever' stance in usage-based pricing.

  • You can bring your own LLM API keys on any paid plan to bypass Vendor Credits entirely, so the platform effectively lets you opt out of one of its two metered dimensions.

  • On 8 September 2025 Relevance AI sunset its Business plan, collapsing the self-serve ladder to Free / Pro / Team and pushing larger buyers to Enterprise.

  • Relevance AI started life (2023 and earlier) as a 'bring your data to life' analysis and visualization tool — vector search, AI clustering, Tableau-like charts — before pivoting to AI agents and the 'AI Workforce' through 2024.

  • Its old credit model used to charge LLM cost plus a 20% markup if you didn't supply your own API key; the September 2025 repackaging dropped the markup entirely and made model cost a zero-margin pass-through.

Replicate 5 facts

  • Replicate's per-second public-model billing means a 4-second FLUX Dev image generation on an A100 costs roughly $0.0056 — finer granularity than competitors' per-image flat rates, though the per-image SKU ($0.025 for FLUX Dev) is still published as a simpler alternative.

  • Replicate was founded in 2019 by Ben Firshman (creator of Docker Compose at Docker) and Andreas Jansson — making it the rare AI infrastructure platform where the founder co-created the single most-used developer tool in containers.

  • Cog, Replicate's open-source model-packaging framework, predates the company's commercial inference platform by two years — and remains the de-facto standard for packaging ML models with PyTorch and TensorFlow runtimes, similar to how Truss became Baseten's developer wedge.

  • Replicate hosts over 50,000 public models — by far the largest public-model catalog among managed-inference platforms. The community model directory makes Replicate the canonical entry point for 'is there an open-source model for X' for AI engineers.

  • Replicate's $1,525/hour H100 dedicated rate ($0.001525/sec × 3,600) is roughly comparable to Modal's $3,949/hour H100 ($0.001097/sec × 3,600) — both substantially undercutting AWS Bedrock and Vertex AI hosted H100 rates while remaining higher than raw AWS on-demand.

Roboflow 4 facts

  • Roboflow denominates almost every billable action — image storage, AI labeling, GPU training minutes, CPU/GPU inference hours, and even third-party LLM tokens — in a single unified "credit," so 1 credit buys 30 minutes of GPU training or 1,000 hosted-API inferences depending on what you spend it on.

  • The free Public plan hands every user roughly $60/mo of free credits, but the catch is that all datasets and trained models are published openly on Roboflow Universe — privacy starts at the $79/mo Core tier.

  • Roboflow's credit menu prices frontier LLMs directly: as of March 2026, 1 credit buys 400,000 Claude Opus 4.6 input tokens or 1,600,000 GPT-5.1 input tokens, exposing each model's relative cost inside the same currency as GPU training.

  • Roboflow has rebuilt its pricing model three times: it billed a flat $0.01/image in 2020, then pivoted to a Professional subscription "Starting at $999/month" in 2021, before abandoning both for the unified credit it uses today — a rare case of a company changing its core billing unit twice in five years.

RunPod 5 facts

  • RunPod's $0.69/hour RTX 4090 Secure Cloud rate is among the lowest published GPU rates for a workstation-class card — a deliberate positioning play to capture hobbyist and student workloads that hyperscalers price out of reach.

  • RunPod was founded in 2022 by Zhen Lu and Pardeep Singh, both ex-cryptocurrency-mining infrastructure operators who pivoted hardware from GPU mining to AI inference as the mining-to-AI transition accelerated through 2022–2023.

  • RunPod runs two distinct clouds: Secure Cloud (enterprise-grade data centers, redundant infrastructure) and Community Cloud (lower-cost, partner-operated DCs with reduced reliability guarantees) — letting customers pick the price-reliability trade-off explicitly per workload.

  • RunPod's Serverless billing is per-second with flex worker prices from $0.58/hour to $8.64/hour depending on GPU type — among the most granular Serverless rate ladders, covering small single-card workloads (16GB cards) through frontier Blackwell inference (B200 at $8.64/hour).

  • Storage tier complexity is notable: container disk vs volume disk (running vs idle) vs network storage (standard vs high-performance, with tiered <1TB and >1TB rates) — five distinct storage SKUs that finance teams must aggregate to forecast total storage spend.

Runway 5 facts

  • Runway prices the same credit currency in two output languages at once: 625 credits is published as both '25s of Gen-4.5 video' and '78 Gen-4 images', so the metered unit means different things per model.

  • The 'Unlimited' plan still ships a 2,250 monthly credit allowance — unlimited generation only applies to the slower, lower-priority Explore mode, not full-speed output.

  • Runway's $15 Standard and $35 Pro price points predate 2026: archived pricing from early 2022 shows the same two figures, when the top tier was 'Pro Plus' at $90 rather than today's Unlimited at $95.

  • Monthly credits never roll over — they reset within 24 hours of the billing date — so any unused allowance is forfeited each cycle.

  • The developer API auto-upgrades through five usage tiers purely on cumulative spend: hit $5,000 purchased and you jump to Tier 5, lifting the monthly spend cap to $100,000 and concurrency to 20.

Suno 5 facts

  • Suno gates commercial-use rights, not just capacity, behind any paid plan — its help center says paid-plan songs let creators collect 100% of royalties with Suno claiming no share.

  • The Free plan refreshes 50 credits every day rather than monthly, a deliberate retention nudge that brings casual users back daily while reserving monetization rights for subscribers.

  • Suno raised $250M at a $2.45B valuation in November 2025 (Menlo Ventures, Nvidia's NVentures) the same week it settled Warner Music's copyright lawsuit and acquired Songkick from Warner.

  • Premier ($30/mo) is the only tier that includes Suno Studio, a generative-AI digital audio workstation launched September 2025 — a feature gate, not a credit gate, separates it from Pro.

  • Despite model upgrades from v3.5 through v5.5, Suno's headline tier prices ($10 Pro, $30 Premier) have held since at least May 2024 — value rises via model quality, not price.

Synthesia 5 facts

  • Synthesia decouples price from seats entirely — paid plans cap editors and guests rather than charging per seat, with the real value metric being generated video minutes.

  • Editing a video only bills the new seconds you change: a 3-second tweak to a 1-minute clip costs 3 seconds, not 60.

  • There is no per-minute overage — exceeding your plan caps usage until renewal, so the bill is bounded by design rather than risk of spend spikes.

  • Synthesia's enterprise page claims adoption by over 90% of the Fortune 100, leaning on SOC 2, GDPR, and ISO 42001 (an AI-management-system standard) as trust signals.

  • Annual billing front-loads a much larger credit pool (Starter jumps from 1,200 credits/mo to 14,500 credits/yr) on top of the headline 34% price cut.

Tavus 5 facts

  • Tavus's 2023 pricing page sold a flat $275/mo 'Intro' plan for 200 personalized marketing videos — by August 2024 the same domain advertised 'Transparent usage-based pricing' with a $0 Free tier and pay-as-you-go video minutes, a full repackaging from per-video to per-minute.

  • Tavus ships three of its own foundation models inside one billed minute: Raven (perception), Sparrow (turn-taking), and Phoenix (rendering) — so a single 'conversation minute' meters an entire multimodal pipeline, not one model call.

  • The cheapest paid developer plan rose from $39/mo to $59/mo between February and March 2025 while the page rebranded from 'usage-based pricing' to 'Pricing built to scale' and added a '$12k annual' Enterprise floor.

  • Tavus runs the same conversation-minute value metric two opposite ways: uncapped pay-as-you-go for developers (CVI) and flat consumer subscriptions for PALs ($0 / $20 / $50), where minutes are a hard allowance, not an overage.

  • Tavus raised a $40M Series B in November 2025 (led by CRV) to reposition as 'The Human Computing Company' — total funding ~$64M from Sequoia, Scale, Y Combinator, and HubSpot Ventures since its YC Summer 2021 batch.

Together AI 5 facts

  • Together AI's $4.99/hr H100 reserved rate (7–30 day reservation) is one of the lowest published rates for any managed Hopper-class GPU — and the $9.65/hr reserved B200 sets a similar floor for Blackwell, both undercutting Fireworks' on-demand rates.

  • Together was co-founded by Vipul Ved Prakash (ex-Cloudmark, Topsy founder), Ce Zhang (ETH Zurich systems professor), Chris Re (Stanford ML/Snorkel), and Percy Liang (Stanford CRFM director) — making it the rare commercial product where two top academic ML labs are co-architects of the platform.

  • Together's serverless rate card publishes per-model pricing inline on the pricing page (rare among competitors like Fireworks which route to docs), making per-model side-by-side comparison friction-free.

  • Code Sandbox ($0.0446/vCPU-hour, $0.0149/GiB-hour) and Code Interpreter ($0.03/session) launched in 2025 as separate metered SKUs for agentic and code-execution workloads — adding non-token billing dimensions to the rate card.

  • Together raised a $305M Series B in February 2025 led by General Catalyst at a $3.3B valuation, with NVIDIA and Salesforce Ventures participation — making Together the highest-valued specialized-inference cloud at that point.

Vercel 4 facts

  • Vercel meters across eight distinct billing dimensions on a single Pro plan — more axes than any other PaaS competitor (Netlify uses 4, Cloudflare Workers uses 3).

  • Fluid Compute charges $0 for I/O wait time — a category-first 'Active CPU' pricing model that can cut function bills by up to 90% for I/O-heavy workloads.

  • v0 (Vercel's AI UI generator) is billed separately from the core platform, with Free, Team, Business, and Enterprise tiers.

  • The Pro plan's $20 monthly usage credit absorbs overages in a fixed priority order: bandwidth first, then edge requests, then function invocations — useful trivia for budget tuning.

Wispr Flow 3 facts

  • Wispr started in 2021 as a neural wristband company trying to translate silent speech into text via a brain-computer interface. After two years of R&D, the founders realised the throwaway dictation app they had built to test the wristband had product-market fit on its own, pivoted in mid-2024, and shipped the Mac app six weeks later — which hit #1 on Product Hunt for both the day and the week of 2024-10-01.

  • Wispr Flow meters words-per-week per platform, not minutes of audio — Basic gives 2,000 words/week on Mac/Windows, 1,000/week on iPhone, and (during a 2026 promo) unlimited on Android. For comparison, Otter.ai's free tier caps at 300 minutes/month of audio (~30,000 words at conversational speed) but is single-axis; Flow's per-platform split is the only word-quota model in the AI dictation category.

  • Flow Pro stayed at $12/user/mo from its 2023 launch through at least mid-2025 — multiple third-party trackers report a 2026 increase to $15/mo on monthly billing while keeping the annual-equivalent rate at $12. The annual-billed effective rate has not moved since launch, making this one of the most stable headline prices in the AI-tools category at a time when Cursor, Replit, and Lovable all repriced in 2025.

You.com 5 facts

  • You.com once sold a $30/mo Team plan; in September 2025 it replaced Team with a $200/mo 'Max' plan built around unlimited ARI research reports — a ~6.7x reprice of its top consumer tier, recovered from Wayback snapshots of you.com/plans.

  • Between March and April 2026 You.com cut its API prices sharply: the Contents API dropped from $10 to $1 per 1,000 pages (10x) and Search dropped from $6.25/$8.00 to a flat $5 per 1,000 calls.

  • You.com was founded in 2021 by Richard Socher and Bryan McCann, both former Salesforce AI researchers; its September 2024 Series B ($50M, led by Georgian) drew investment from NVIDIA, Salesforce Ventures, and even rival search engine DuckDuckGo.

  • The Research API charges purely by 'research effort' tier, ranging from $12/1k calls (lite) to a Contact-Sales 'Frontier' tier listed at >$2,000/1k calls — a ~167x spread on the same endpoint.

  • Every You.com API account starts with $100 in free credit and there are no seats, minimums, or platform fees — billing is entirely per-call / per-page.

Back to companies