What is it
Observability Platform Pricing is pricing for LLM and ML observability platforms — tracing, evaluation, and monitoring of model behavior in production. The product ingests the telemetry an AI application throws off (traces, spans, evaluation scores, logged payloads) and the pricing follows the pipe: a free ingestion quota, metered volume above it, and a retention window that decides how long the data stays queryable.
The category converged on a remarkably uniform shape. Langfuse bills composite “units” past 50k free with graduated overage from $8 to $6 per 100k; Arize AI meters spans and ingested gigabytes on a dual-axis Pro plan at $50/month; Galileo counts whole traces (5,000 free, 50,000 on the $100 Pro tier); Comet’s Opik prices spans with a flat $19/month team account; Braintrust runs three meters — token credits, processed data, and scores — under a $0 or $249 platform fee. HoneyHive, Athina AI, Patronus AI, and PromptLayer each vary the recipe without leaving it.
Two structural facts shape every quote in this category: open source is everywhere (Arize’s Phoenix, Comet’s Opik, Langfuse’s self-host path), so managed pricing is disciplined by the self-hosting alternative; and the category is consolidating fast — ClickHouse acquired Langfuse in January 2026 and Cisco closed on Galileo in May 2026 — which makes pricing stability itself a buying criterion.
How it works
The standard bill is platform fee + (ingestion − quota) × rate, with retention and seats as packaging levers:
| Lever | Range in the cohort | Examples |
|---|---|---|
| Ingestion unit | span · trace · composite unit · txn · GB | Arize spans + GB; Galileo traces; Langfuse units; PromptLayer txns |
| Free quota | 2.5k–50k units/mo | Langfuse 50k units; Arize & Opik 25k spans; HoneyHive 10k events; Galileo 5k traces |
| Overage rate | $2–$10 per relevant unit block | Arize ~$10/M spans + ~$3/GB; Langfuse $8→$6/100k; PromptLayer $0.003→$0.002/txn |
| Retention | 14 days → 3 years | Braintrust 14→30d; Arize 15→30d; Patronus 2-week window; Langfuse 90d→3y at $199 |
| Seats | Free/unlimited (mostly) | Braintrust unlimited all tiers; Opik $19 covers 50 members; PromptLayer caps by tier |
Worked example — dual-axis fan-out. A RAG app sending 1M spans and 25 GB a month to Arize AI Pro pays $50 base, ~$9.50 span overage (950k × ~$10/M), and ~$45 data overage (15 GB above the included 10 at ~$3/GB) — about $105/month, with the data axis, not the span count, doing most of the damage. The same payload-heavy behavior on Braintrust hits its processed-data meter at $3–4/GB, which counts every byte moving through the platform. Instrument first, then price — the usage-event tracking guide covers how to measure your own fan-out before committing to a tier.
Companies using this
9 in-corpus companies sell observability platforms: Arize AI, Athina AI, Braintrust, Comet (Opik), Galileo, HoneyHive, Langfuse, Patronus AI, and PromptLayer. All ship a free tier; seven publish self-serve rates and all nine gate their enterprise tier behind sales.
Patterns observed
The cohort prices ingestion, gives away seats, and charges for memory. Ingestion quotas ladder from generous free tiers into metered overage (Langfuse’s graduated curve, Arize’s dual axis, PromptLayer’s normalized txn at $0.003 falling to $0.002 on Team); seats are free or unlimited almost everywhere because a per-user tax would fight the meter; and retention is the quiet upsell — Patronus AI gates its entire free tier on a rolling 2-week window rather than on features, and Braintrust and Arize both double retention exactly where the paid tier starts.
The second pattern is defensive unit design. Every vendor that survived repricing simplified toward a unit buyers can audit: Comet prices Opik spans flat per account; Athina AI deliberately meters executions while ingested logs cost zero; PromptLayer folds requests, agent runs, and eval cells into one txn, trading precision for forecastability — the trade-off the usage-metric guide frames as legibility versus cost-tracking.
Counterexamples & variants
Braintrust is the in-category variant that breaks the “one ingestion meter” mold: three independent meters (token credits at $0.06/$0.40 per Mtok, processed data at $3–4/GB, scores at $1.50–2.50/1k) under a flat platform fee — closer to a cloud bill than a SaaS tier, with the processed-data line as the recurring surprise. Patronus AI breaks it the other way: no ingestion meter at all on the self-serve tier, just a hard 2-week data window, with the actual money in a quoted enterprise plan and an optional pay-as-you-go evaluation API. And Athina AI inverts the category’s core assumption — telemetry in is free, and only compute the platform itself initiates (prompt runs, eval cells, flow steps) burns credits — proof that “observability pricing” can meter work done rather than data received.
What this means for buyers vs vendors
For buyers
Run a week of production-shaped traffic through two or three free tiers and read the actual meters before choosing — the same workload registers as wildly different volumes depending on whether the unit is a trace (Galileo), a span (Arize, Comet), or a composite unit (Langfuse). Price the retention you actually need for audits, not the default; and in a consolidating category, ask what happens to your rates and your data export path if the vendor is acquired — two of these nine were, within five months.
For vendors
The category’s settled playbook is: meter ingestion, free the seats, ladder retention, and keep an open-source pressure valve. The open design question is bill-shock control — graduated curves (Langfuse) and flat team accounts (Opik) both beat raw linear overage on trust. If your true cost driver is bytes rather than events, surface it as its own published axis the way Arize does, rather than letting a hidden data meter aggregate into an invoice the buyer can’t reconstruct.
| Company | Product | Pricing model | Billing units | Free tier | Verified |
|---|---|---|---|---|---|
| Arize AI | AI & LLM observability (Arize AX + Phoenix OSS) | Yes | 2026-06-09 | ||
| Athina AI | Collaborative AI development platform for building, testing, evaluating and monitoring LLM features | Yes | 2026-06-04 | ||
| Braintrust | LLM evaluation & observability platform | Yes | 2026-06-09 | ||
| Comet | AI/ML observability and experiment-tracking platform — Opik (LLM/agent observability) and Comet MLOps (experiment tracking) | Yes | 2026-06-02 | ||
| Finout | Finout — enterprise cloud + AI cost observability (FinOps) platform | No | 2026-06-10 | ||
| Galileo | AI observability, evaluation, and guardrails platform for agents and LLM apps | Yes | 2026-06-04 | ||
| HoneyHive | AI observability and evaluation platform for LLM and agent applications | Yes | 2026-06-04 | ||
| Langfuse | Open-source LLM observability, evals, and prompt management | Yes | 2026-06-09 | ||
| Patronus AI | LLM and AI agent evaluation, monitoring, and guardrail platform | Yes | 2026-06-04 | ||
| PromptLayer | Prompt management, evaluation, and observability platform for LLM and AI-agent teams | Yes | 2026-06-04 | ||
| Vantage | Vantage — cloud + AI cost monitoring and FinOps platform | Yes | 2026-06-10 |
FAQ
How is LLM observability priced?
Almost universally on ingestion volume — traces, spans, or composite units — against a tier's included quota, with retention windows as the second axis. Langfuse bills units at $8 down to $6 per 100k past a 50k free allowance; Arize meters spans plus ingested GB; Galileo meters whole traces. Seats are usually free or unlimited.
Which companies are in the observability pricing cohort?
Nine in-corpus platforms: Arize AI, Athina AI, Braintrust, Comet (Opik), Galileo, HoneyHive, Langfuse, Patronus AI, and PromptLayer. All offer a free tier; most publish self-serve rates and gate enterprise behind sales.
Why do observability free tiers feel so generous?
Because telemetry only becomes valuable at production volume, the free quota is the acquisition funnel: Langfuse gives 50k units/month, Arize 25k spans, Comet's Opik 25k spans with 10 team members, HoneyHive 10k events, Galileo 5k traces. The vendor's bet is that instrumented apps grow into the paid meters.
What is the retention axis in observability pricing?
How long your traces stay queryable, sold separately from ingestion. Arize Free keeps 15 days and Pro 30; Braintrust steps from 14 days (Starter) to 30 (Pro); Patronus gates its free tier on a rolling 2-week data window; Langfuse jumps from 90 days to 3 years only at its $199 Pro tier. Keeping data costs more than ingesting it.
Do observability platforms charge per seat?
Mostly no — the meter is ingestion, so seats are deliberately free to spread adoption: Braintrust offers unlimited users on every tier, Arize Pro has no per-seat fees, Langfuse includes unlimited users from $29/month, and Comet's Opik Pro covers 50 members for a flat $19. PromptLayer is the main exception, capping users by tier.
Trivia
-
Braintrust's "processed data" meter is the bill-shock engine of the category: it counts every byte that moves through the platform, so a $249/month Pro plan with 5 GB included can balloon on payload-heavy agent traces at $3/GB before the token or score meters even register.
-
Comet's Opik Pro costs $19/month flat per account for up to 50 team members and 100k spans — the cheapest paid tier in the observability cohort prices the whole team below what several competitors charge for a single seat-equivalent.
-
Two of the nine observability vendors were acquired within five months of each other: ClickHouse took Langfuse in January 2026 and Cisco closed on Galileo in May 2026 — and in both cases the published span/unit pricing survived the acquisition.
Related product categories
- AI Coding Product PricingPricing for products whose primary surface is AI-assisted coding — IDEs, completion engines, and review agents.
- Developer Tools PricingPricing models used by tools sold to developers — IDEs, CLIs, libraries, voice-to-code, and adjacent products.
- AI Platform PricingPricing for general-purpose AI platforms — model APIs, inference services, and multi-model hosting providers.
- AI Infrastructure & Cloud PricingPricing for AI compute infrastructure — GPU clouds, serverless inference, and training platforms.
- Data Platform PricingPricing for data platforms — scraping, enrichment, search API, and knowledge-graph vendors.
- Vertical SaaS PricingPricing for vertical SaaS products — AI software purpose-built for a specific industry (legal, healthcare, sales, marketing).
- Customer Service Platform PricingPricing for customer service software platforms — ticketing, chat, automation, and AI agent products.
- Horizontal SaaS PricingPricing for horizontal AI SaaS — productivity and workflow products sold across industries rather than to one vertical.
- Fintech AI PricingPricing for AI-era fintech products — billing infrastructure, accounting automation, and financial operations platforms.