Vector Storage Pricing: Examples & Companies

What is it

Vector Storage Pricing is a billing unit where customers are charged for vectors stored or indexed — the storage dimension of vector database pricing. Every retrieval-augmented application keeps a corpus of embeddings at rest, and that corpus has a carrying cost the vendor passes on: per vector, per vector-dimension, or per GB of index, billed monthly, almost always alongside separate meters for writes and queries.

The unit sounds simple — count the vectors — but no two vendors count the same way. Weaviate normalizes to vector-dimensions, because a 1,536-dimension embedding genuinely costs twice as much to hold as a 768-dimension one. Pinecone meters storage in GB beside read units and write units. Chroma splits it into four separate line items. turbopuffer charges per GB-month against a tier minimum. Four incompatible ways to price the same corpus — the worked examples below convert them.

What unites the cohort is that storage is the compounding meter. Query traffic rises and falls; the embedded corpus only grows. A team ingesting documents continuously watches the storage line overtake the query line — which is why the vendors chasing large RAG workloads anchor pricing to cheap object storage and pass the deflation through as published price cuts.

The unit also has a boundary. Some vendors in this corpus store no vectors at all: DeepInfra prices embedding generation per token ($0.005–$0.01 per 1M input tokens) and hands the vector back for you to store elsewhere. So a single corpus of embeddings often produces two invoices — one from the model that generated them, one from the database that holds them — and the two are frequently different companies.

Same corpus, four incompatible meters

How it works

The base formula is bill = storage at rest × rate + writes × rate + reads × rate, with a plan minimum or platform fee underneath. The design choices are which unit measures “at rest,” and how the three meters are balanced:

Lever	What it controls	Example from the corpus
Storage unit	How embedding-model choice affects the bill	Weaviate: per 1M vector-dimensions; Pinecone, Chroma & turbopuffer: per GB/GiB; Zilliz: vCUs
Read/write split	Which workload shape pays	Pinecone Standard: ~$16–18/M read units vs ~$4–4.50/M write units
Plan minimum	Revenue floor without a platform fee	turbopuffer: greater of usage or $64 (Launch) / $256 (Scale) / $4,096 (Enterprise)
Compliance packaging	Price of HIPAA/SSO/BYOC	turbopuffer sells compliance as a higher minimum, not a higher rate; Enterprise adds a 35% usage premium
Free-tier sizing	Where evaluation ends	Pinecone 2 GB + 2M WU/1M RU; Zilliz 5 GB + 2.5M vCUs; Upstash 200M vector×dims; Qdrant a free 1 GB-RAM cluster
Self-host escape hatch	Ceiling on managed pricing	Chroma, Qdrant, Milvus, Weaviate, LanceDB all ship free Apache/BSD engines

Worked example — dimension math. A 10M-vector corpus embedded at 1,536 dimensions is 15,360M stored dimensions. On Weaviate Flex that is 15,360 × $0.00465 ≈ $71/month before per-GiB storage; the same corpus embedded at 768 dimensions halves it. The embedding model — usually chosen for retrieval quality alone — is silently a pricing decision, which is exactly the kind of coupling the choosing the right usage metric guide warns both sides to surface.

Worked example — the write-unit floor. Pinecone meters a minimum of 5 write units per upsert request (1 WU per 1 KB, whichever is larger). An agent pipeline making 1M small upserts a month therefore burns at least 5M WU ≈ $20–22 on Standard — and its read-unit formula charges ~1 RU per 1 GB of namespace scanned (min 0.25 RU/query), so a query against a bloated namespace costs more even when it returns one row. For agent-style workloads, writes — not reads — are often the real cost driver.

Worked example — the four-meter sum. On Chroma, the “$0/month” headline hides four meters. Chroma’s own calculator models a mid-size workload — 1M docs written (~13 GiB @ $2.50), 6M docs stored (~80 GiB @ $0.33/GiB-mo), 10M queries — landing around $79/month. The line buyers underestimate is egress at $0.09/GiB returned, which scales with how much data queries send back, not how many queries run.

Worked example — minimum as floor. On turbopuffer, a small production workload whose metered usage totals $23 still pays the $64 Launch minimum; the floor funds support and SOC 2 without a separate platform fee. The same workload needing a HIPAA BAA pays the $256 Scale minimum at identical usage rates — compliance priced as a floor, with the meter left clean.

Companies using this

10 in-corpus companies meter stored or indexed vectors: the dedicated vector databases (Pinecone, Weaviate, Chroma, Qdrant, Milvus/Zilliz, LanceDB, turbopuffer), the serverless data platform Upstash, and two adjacent cases — Nomic, which wraps embedding storage inside its Atlas and Platform plans, and DeepInfra, which prices the embedding-generation side per token. The table below lists each with its pricing model, billing units, free-tier status, and last-verified date.

Patterns observed

Reads, writes, and storage split apart. The category converged on decoupling the three cost drivers after Pinecone showed it cut bills 10x–50x for variable workloads versus its old pod-based model. Chroma takes it to four meters and turbopuffer to three, on the shared thesis that a chatbot’s read-heavy shape and an agent’s write-heavy shape should not pay the same blended rate.

Minimums instead of platform fees. turbopuffer charges the greater of usage or the tier floor ($64 / $256 / $4,096); Pinecone sets a $50 Standard / $500 Enterprise monthly minimum under pure usage rates; Weaviate’s plans start “from $45 / $280 / $400” as PAYG or annual-prepaid floors. The meter stays clean; the floor funds support without a separate SKU. Chroma is the exception that proves it — a genuine $0 Starter base it markets directly against Pinecone’s floor.

The unit is normalized toward true cost. Vendors keep moving from raw “vectors” toward units that track bytes and compute: Weaviate’s vector-dimensions, Zilliz’s vCUs ($4 per million on Serverless), Pinecone’s RU/WU formulas. Upstash caps its free Vector tier at 200M vector × dimensions and its paid PAYG tier at 2B — the multiplication sign doing the normalization so a high-dimensional corpus counts against the cap faster.

Open source disciplines the managed price. Five of the ten — Chroma, Qdrant, Milvus, Weaviate, LanceDB — ship the full engine free under Apache or BSD licenses. Every managed quote is implicitly bid against the buyer running it themselves, keeping published rates honest and pushing differentiation into HA, compliance, and zero-ops. Weaviate’s per-1M-dimension entry rate fell roughly 95% between its 2024 GA card ($0.095/1M) and the 2026 Flex plan, tracking that discipline plus the broader collapse in compute cost.

Storage anchored to a deflating cost base. turbopuffer built its pricing on object storage precisely so it can cut rates as S3-class costs fall, and maintains a public dated changelog where every logged change is a cut — its February 2026 query-rate reduction from $5/PB to $1/PB (up to 94% off the largest namespaces) being the largest. A RAM-priced architecture like Qdrant’s structurally cannot match that deflation curve — a real trade-off, not just a pricing choice.

Free tiers are sized in data, not days. Evaluation tiers are capped by corpus size — Pinecone’s 2 GB, Zilliz’s 5 GB, Qdrant’s free 1 GB-RAM cluster, Weaviate’s 100,000 objects — and never expire. The bet is that the corpus, and therefore the bill, grows past the cap on its own; no countdown timer is needed when the data does the throttling.

Counterexamples & variants

The loudest counterexample sits inside the category itself: Qdrant refuses the vector unit entirely. Its managed cloud meters vCPU, GB of RAM, and GB of disk by the hour — and there are no per-query or read/write unit charges at all. The argument is that resources, not abstractions, are what the vendor actually provisions; the cost is that buyers must translate “10M vectors at 1,536 dims” into a cluster size themselves before they can compare a Qdrant quote against a Pinecone or Weaviate one. Because vectors are held largely in memory, RAM is the hidden cost driver — high dimensionality, large vector counts, and full-precision (vs quantized) storage all push you into a bigger, pricier cluster in ways first-time buyers underestimate.

Zilliz’s Dedicated tier and LanceDB make the same resource-over-abstraction move at the high end. Zilliz Dedicated sells reserved CU-hours rather than metered vCUs; LanceDB Cloud is currently free during public beta with usage-based storage-plus-compute pricing to follow, but its public pricing page is a contact form — the hard numbers live in sales quotes and an AWS Marketplace listing ($60,000/year annual commit with $0.01 per LCU overage). That opacity is itself a variant worth naming: half this cohort publishes a full rate card, and the half that gates it (LanceDB’s price_transparency: gated) is betting that its largest customers — Runway, Midjourney, Character.ai, running tens of billions of vectors — negotiate anyway.

Nomic shows the unit disappearing into packaging: embedding storage is real inside Atlas, but the Platform sells $40 seats that each contribute $20 to a pooled AI-usage balance, with text overage billed at $1 per 10M tokens over the pool. The vector meter still exists — it’s just buried two layers beneath a seat-shaped contract.

What this means for buyers vs vendors

For buyers

Normalize before you compare. Convert every quote into dollars per month for your corpus (vector count × dimensions × bytes) and your read/write mix, because the units don’t line up across vendors — per-dimension (Weaviate), per-GB/GiB (Pinecone, Chroma, turbopuffer), per-vCU (Zilliz), or per-resource-hour (Qdrant). A per-dimension rate and a per-vCU rate cannot be eyeballed against each other; only your own corpus resolves them into comparable dollars.

Then stress the asymmetries. Agent workloads hit Pinecone’s 5-WU-per-request write minimum, result-heavy queries hit Chroma’s $0.09/GiB egress, bloated namespaces inflate Pinecone read units regardless of result count, and high-QPS RAG can run up Zilliz vCUs (each read is a minimum of 6 vCUs) even with cheap storage. Check where the minimums bite — a $64 turbopuffer floor is irrelevant at scale and decisive for a side project — and treat embedding-dimension choice as a standing line item: re-embedding a corpus at half the dimensions is sometimes the single biggest discount available, and it’s one you control rather than negotiate.

Finally, price the escape hatch. Five of these ten ship a free engine, so a managed quote is worth comparing against your own compute-plus-RAM-plus-object-storage bill — not because self-hosting is always cheaper (HNSW indexes are RAM-hungry and someone has to run the cluster), but because knowing the self-host number is the strongest leverage you have on the managed one. The usage-metric guide has the general framework for auditing a multi-meter quote, and the aggregation-methods guide explains how formulas like read-units-by-namespace-size are computed under the hood.

For vendors

Decouple reads, writes, and storage so each workload shape pays its own way. A single blended rate over-charges the read-heavy and under-charges the write-heavy; splitting the meters is what makes the price feel fair to both, and it is the move that produced this category’s largest cost-cut claims.

Anchor the storage rate to a deflating cost base and pass cuts through publicly — a dated, cut-only pricing changelog turns falling object-storage costs into a trust asset rather than pocketed margin. Sell compliance as a floor or a rate premium on the same meters rather than a separate SKU (turbopuffer’s Enterprise adds a 35% usage premium over the Launch rate), which keeps the metered story clean while still monetizing HIPAA and single-tenancy.

Publish the full rate card or accept that the open-source escape hatch will be exercised. Half this cohort’s buyers can self-host, so opacity — rates that live only in a calculator or behind a “contact sales” form, as at LanceDB — converts directly into evaluation churn for anything short of a nine-figure-vector customer. And expect the aggregation design to be scrutinized: formulas like read-units-by-namespace-size or vCU-per-scan are defensible only if worked examples make them predictable before the first invoice, not surprising after it.

Company	Product	Pricing model	Billing units	Free tier	Verified
Chroma	Open-source vector database + Chroma Cloud	pure-usage freemium	storage-gb bandwidth-gb api-calls	Yes	2026-06-09
DeepInfra	Serverless inference cloud — per-token LLM/embedding APIs, per-image and per-minute media models, per-hour on-demand GPU containers, and reserved DeepCluster GPU clusters	pure-usage commitment	tokens gpu-hours requests	No	2026-07-21
LanceDB	AI-native multimodal lakehouse	freemium pure-usage commitment	storage-gb vectors-indexed gpu-hours	Yes	2026-06-09
Milvus	Vector database (OSS) + Zilliz Cloud (managed)	pure-usage freemium commitment	cpu-hours storage-gb vectors-indexed	Yes	2026-07-21
Nomic	Nomic Platform (AEC agentic workflows) + Atlas data-exploration app + Nomic Embed embedding/Developer API	hybrid seat-based commitment	seats tokens credits	Yes	2026-06-04
Pinecone	Managed vector database (serverless)	pure-usage hybrid	requests storage-gb vectors-indexed	Yes	2026-07-23
Qdrant	Open-source vector database + Qdrant Cloud	pure-usage freemium	cpu-hours gb-hours storage-gb	Yes	2026-07-23
turbopuffer	Serverless vector and full-text search database on object storage	pure-usage commitment	storage-gb vectors-indexed gb-hours	No	2026-07-22
Upstash	Upstash (Redis, Vector, QStash, Workflow, Search, Box)	pure-usage freemium hybrid	requests api-calls vectors-indexed	Yes	2026-07-22
Weaviate	AI-native vector database (open-source core + Weaviate Cloud managed serverless, dedicated/Enterprise Cloud, BYOC)	pure-usage hybrid commitment	vectors-indexed tokens api-calls	Yes	2026-07-06

Explore this theme in the knowledge graph

FAQ

What is vector storage pricing?

Vector storage pricing is a billing unit where customers are charged for the embeddings they keep in a vector database — metered as vectors, vector-dimensions, or GB of index — usually alongside separate charges for writes and queries. It is the storage dimension of vector database pricing at vendors like Pinecone, Weaviate, Chroma, and turbopuffer.

How do vector databases measure stored vectors?

Three conventions coexist: per vector-dimension (Weaviate bills $0.00465 per 1M dimensions on Flex; Upstash caps its free tier at 200M vector × dimensions), per GB or GiB of data (Pinecone at ~$0.33/GB, Chroma at $0.33/GiB-month, turbopuffer per GB-month), and per compute-normalized unit (Zilliz/Milvus vCUs). The conventions are not directly comparable without normalizing to your own corpus.

Which companies use vector storage pricing?

In this corpus, 10 companies meter stored or indexed vectors: Pinecone, Weaviate, Chroma, Qdrant, Milvus (Zilliz Cloud), LanceDB, turbopuffer, Upstash, Nomic, and DeepInfra. Most pair the storage meter with separate read and write charges.

Why do vector databases split reads, writes, and storage?

Because the three drive cost independently: a chatbot reads constantly but writes rarely, while an agent pipeline upserts continuously. Pinecone's move to decoupled read units, write units, and storage matched price to each workload's actual shape and cut bills 10x–50x for variable workloads compared with pod-based pricing.

Does self-hosting a vector database save money?

Sometimes — Chroma, Qdrant, Milvus, Weaviate, and LanceDB all offer free open-source engines — but the license fee is replaced by your own compute, memory (HNSW indexes are RAM-hungry), object storage, and engineering time. Managed pricing in this category is disciplined by the fact that every buyer has that alternative.

What hidden costs should I watch in vector database bills?

Egress (Chroma charges $0.09/GiB on query results), per-request write minimums (Pinecone meters a minimum 5 write units per upsert request), tier minimums that bill even at zero usage (turbopuffer's $64/month floor, Pinecone's $50 Standard minimum), and unpublished rates that only appear in calculators or after sales contact (LanceDB's pricing page is a contact form).

Related billing units

Related guides & calculators

How to Choose the Right Usage Metric

Guide

Back to companies