What is it
Vector Storage Pricing is a billing unit where customers are charged for vectors stored or indexed — the storage dimension of vector database pricing. Every retrieval-augmented application keeps a corpus of embeddings at rest, and that corpus has a carrying cost the vendor passes on: per vector, per vector-dimension, or per GB of index, billed monthly, almost always alongside separate meters for writes and queries.
The unit sounds simple — count the vectors — but no two vendors count the same way. Weaviate normalizes to vector-dimensions ($0.00465 per 1M dimensions stored on its Flex plan, falling to $0.003875 on annual-prepaid Plus), because a 1,536-dimension embedding genuinely costs twice as much to hold as a 768-dimension one. Pinecone meters storage in GB (~$0.33/GB on Standard) beside read units and write units. Chroma bills GiB written, GiB-months stored, TiB queried, and GiB of network egress as four separate line items. turbopuffer charges per GB-month against a tier minimum.
What unites the cohort is that storage is the compounding meter. Query traffic rises and falls with usage; the embedded corpus only grows. A team that ingests documents continuously will watch the storage line overtake the query line — which is why the vendors that win large RAG workloads (turbopuffer most explicitly) anchor their pricing to cheap object storage and pass the deflation through as published price cuts.
How it works
The base formula is bill = storage at rest × rate + writes × rate + reads × rate, with a plan minimum or platform fee underneath. The design choices are which unit measures “at rest,” and how the three meters are balanced:
| Lever | What it controls | Example from the corpus |
|---|---|---|
| Storage unit | How embedding-model choice affects the bill | Weaviate: per 1M vector-dimensions; Pinecone & turbopuffer: per GB; Zilliz: vCUs |
| Read/write split | Which workload shape pays | Pinecone Standard: ~$16–18/M read units vs ~$4–4.50/M write units |
| Plan minimum | Revenue floor without a platform fee | turbopuffer: greater of usage or $64 (Launch) / $256 (Scale) / $4,096 (Enterprise) |
| Compliance packaging | Price of HIPAA/SSO/BYOC | turbopuffer sells compliance as a higher minimum, not a higher rate; Enterprise adds a 35% usage premium |
| Free-tier sizing | Where evaluation ends | Pinecone 2 GB + 2M WU/1M RU; Zilliz 5 GB + 2.5M vCUs; Upstash 200M vector×dims; Qdrant a free 1 GB-RAM cluster |
| Self-host escape hatch | Ceiling on managed pricing | Chroma, Qdrant, Milvus, Weaviate, LanceDB all ship free Apache/BSD engines |
Worked example — dimension math. A 10M-vector corpus embedded at 1,536 dimensions is 15,360M stored dimensions. On Weaviate Flex that is 15,360 × $0.00465 ≈ $71/month before per-GiB storage; the same corpus embedded at 768 dimensions halves it. The embedding model — usually chosen for retrieval quality alone — is silently a pricing decision, which is exactly the kind of coupling the choosing the right usage metric guide warns both sides to surface.
Worked example — the write-unit floor. Pinecone meters a minimum of 5 write units per upsert request. An agent pipeline making 1M small upserts a month therefore burns at least 5M WU ≈ $20–22 on Standard — and its read-unit formula keys off namespace size, not result count, so a query against a bloated namespace costs more even when it returns one row. For agent-style workloads, writes — not reads — are often the real cost driver.
Worked example — minimum as floor. On turbopuffer, a small production workload whose metered usage totals $23 still pays the $64 Launch minimum; the floor funds support and SOC 2 without a separate platform fee. The same workload needing a HIPAA BAA pays the $256 Scale minimum at identical usage rates — compliance priced as a floor, with the meter left clean.
Companies using this
10 in-corpus companies meter stored or indexed vectors: the dedicated vector databases (Pinecone, Weaviate, Chroma, Qdrant, Milvus/Zilliz, LanceDB, turbopuffer), the serverless data platform Upstash, and two adjacent cases — Nomic, which wraps embedding storage inside its Atlas and Platform plans, and DeepInfra, which prices the embedding-generation side per token.
Patterns observed
-
Three meters, decoupled. The category converged on splitting reads, writes, and storage after Pinecone demonstrated that decoupling the three cost drivers cut bills 10x–50x for variable workloads versus its old pod-based model. Chroma goes one further with four meters (written / stored / queried / egress), and turbopuffer bills writes, queries, and GB-months independently.
-
Minimums instead of platform fees. turbopuffer charges the greater of usage or the tier floor; Pinecone sets $50 (Standard) and $500 (Enterprise) monthly minimums under pure usage rates; Weaviate’s plans start “from $45/$280/$400” as annual-prepaid floors. The meter stays clean; the floor guarantees the vendor a support-funding baseline.
-
The unit is normalized toward true cost. Vendors keep moving from “vectors” toward units that track bytes and compute: Weaviate’s vector-dimensions, Zilliz’s vCUs, Pinecone’s RU/WU formulas. Upstash caps its free Vector tier at 200M vector × dimensions — the multiplication sign doing the normalization work.
-
Open source disciplines the managed price. Five of the ten (Chroma, Qdrant, Milvus, Weaviate, LanceDB) ship the full engine free under Apache or BSD licenses. Every managed quote is implicitly bid against the buyer running it themselves, which keeps published rates honest and pushes differentiation into HA, compliance, and zero-ops.
-
Storage anchored to a deflating cost base. turbopuffer built its pricing on object storage precisely so it can cut rates as S3-class costs fall, and maintains a public, dated changelog where every logged change is a cut — a structural advantage RAM-priced architectures cannot match.
-
Free tiers are sized in data, not days. Evaluation tiers are capped by corpus size — Pinecone’s 2 GB, Zilliz’s 5 GB, Qdrant’s free 1 GB-RAM cluster, Upstash’s 200M dimensions — and never expire. The vendor’s bet is that the corpus, and therefore the bill, only grows.
Counterexamples & variants
The loudest counterexample sits inside the category itself: Qdrant refuses the vector unit entirely. Its managed cloud meters vCPU, GB of RAM, and GB of storage by the hour — and queries are free. The argument is that resources, not abstractions, are what the vendor actually provisions; the cost is that buyers must translate “10M vectors at 1,536 dims” into a cluster size themselves before they can compare a Qdrant quote against a Pinecone or Weaviate one. Zilliz’s Dedicated tier makes the same move at the high end, selling reserved CU-hours rather than metered operations.
Nomic shows the unit disappearing into packaging: embedding storage is real inside Atlas, but the Platform sells $40 seats that each contribute $20 to a pooled AI-usage balance, with a $1,000/month commitment and 25-seat minimum on Business. The vector meter still exists — it’s just buried two layers beneath a seat-shaped contract. And DeepInfra marks the category’s boundary from the other side: it prices embedding generation at $0.005–$0.01 per 1M tokens and stores nothing — a reminder that every vector-storage bill has a sibling line item on some inference provider’s invoice, and that the storage vendor and the embedding vendor are often different companies billing the same corpus.
What this means for buyers vs vendors
For buyers
Normalize before you compare: convert every quote into dollars per month for your corpus (vector count × dimensions × bytes) and your read/write mix, because the units don’t line up across vendors — per-dimension (Weaviate), per-GB (Pinecone, turbopuffer), per-vCU (Zilliz), or per-resource-hour (Qdrant). Then stress the asymmetries: agent workloads hit Pinecone’s 5-WU-per-request write minimum, result-heavy queries hit Chroma’s $0.09/GiB egress, and bloated namespaces inflate Pinecone read units regardless of result count. Check where the minimums bite — a $64 floor is irrelevant at scale and decisive for a side project — and treat embedding-dimension choice as a standing line item: re-embedding a corpus at half the dimensions is sometimes the biggest discount available. The usage-metric guide has the general framework for auditing a multi-meter quote.
For vendors
The corpus-tested playbook: decouple reads, writes, and storage so each workload shape pays its own way; anchor the storage rate to a deflating cost base and pass cuts through publicly (turbopuffer’s dated changelog turned price cuts into a trust asset); and sell compliance as a floor or a premium on the same meters rather than a separate SKU — turbopuffer’s $64 → $256 → $4,096 ladder with a 35% Enterprise usage premium keeps the metered story clean. Publish the full rate card or accept that the open-source escape hatch will be exercised: half this cohort’s buyers can self-host, so opacity (rates that live only in calculators or sales calls) converts directly into evaluation churn. Finally, expect the aggregation design to be scrutinized — formulas like read-units-by-namespace-size are defensible, but only if worked examples make them predictable before the first invoice.
| Company | Product | Pricing model | Billing units | Free tier | Verified |
|---|---|---|---|---|---|
| Chroma | Open-source vector database + Chroma Cloud | Yes | 2026-06-09 | ||
| DeepInfra | Serverless inference cloud — per-token LLM/embedding APIs, per-image and per-minute media models, per-hour on-demand GPU containers, and reserved DeepCluster GPU clusters | No | 2026-06-02 | ||
| LanceDB | AI-native multimodal lakehouse | Yes | 2026-06-09 | ||
| Milvus | Vector database (OSS) + Zilliz Cloud (managed) | Yes | 2026-06-09 | ||
| Nomic | Nomic Platform (AEC agentic workflows) + Atlas data-exploration app + Nomic Embed embedding/Developer API | Yes | 2026-06-04 | ||
| Pinecone | Managed vector database (serverless) | Yes | 2026-06-09 | ||
| Qdrant | Open-source vector database + Qdrant Cloud | Yes | 2026-06-09 | ||
| turbopuffer | Serverless vector and full-text search database on object storage | No | 2026-06-04 | ||
| Upstash | Upstash (Redis, Vector, QStash, Search, Workflow) | Yes | 2026-06-03 | ||
| Weaviate | AI-native vector database (open-source core + Weaviate Cloud managed serverless, dedicated/Enterprise Cloud, BYOC) | Yes | 2026-06-09 |
FAQ
What is vector storage pricing?
Vector storage pricing is a billing unit where customers are charged for the embeddings they keep in a vector database — metered as vectors, vector-dimensions, or GB of index — usually alongside separate charges for writes and queries. It is the storage dimension of vector database pricing at vendors like Pinecone, Weaviate, Chroma, and turbopuffer.
How do vector databases measure stored vectors?
Three conventions coexist: per vector-dimension (Weaviate bills $0.00465 per 1M dimensions on Flex; Upstash caps its free tier at 200M vector × dimensions), per GB or GiB of data (Pinecone at ~$0.33/GB, Chroma per GiB-month, turbopuffer per GB-month), and per compute-normalized unit (Zilliz/Milvus vCUs). The conventions are not directly comparable without normalizing to your own corpus.
Which companies use vector storage pricing?
In this corpus, 10 companies meter stored or indexed vectors: Pinecone, Weaviate, Chroma, Qdrant, Milvus (Zilliz Cloud), LanceDB, turbopuffer, Upstash, Nomic, and DeepInfra. Most pair the storage meter with separate read and write charges.
Why do vector databases split reads, writes, and storage?
Because the three drive cost independently: a chatbot reads constantly but writes rarely, while an agent pipeline upserts continuously. Pinecone's move to decoupled read units, write units, and storage matched price to each workload's actual shape and cut bills 10x–50x for variable workloads compared with pod-based pricing.
Does self-hosting a vector database save money?
Sometimes — Chroma, Qdrant, Milvus, Weaviate, and LanceDB all offer free open-source engines — but the license fee is replaced by your own compute, memory (HNSW indexes are RAM-hungry), object storage, and engineering time. Managed pricing in this category is disciplined by the fact that every buyer has that alternative.
What hidden costs should I watch in vector database bills?
Egress (Chroma charges $0.09/GiB on query results), per-request write minimums (Pinecone meters a minimum 5 write units per upsert request), tier minimums that bill even at zero usage (turbopuffer's $64/month floor, Pinecone's $50 Standard minimum), and unpublished rates that only appear in calculators or after sales contact.
Trivia
-
Pinecone's read-unit formula keys off the size of the namespace being queried, not the number of results returned — an unusual mechanic that rewards good data partitioning and surprises teams expecting per-query pricing.
-
turbopuffer's Enterprise tier charges a 35% premium on the same usage meters as the $64/month Launch tier — compliance and single-tenancy are sold as a higher floor and a rate multiplier, never as a different unit.
-
Weaviate meters stored vectors per 1M dimensions ($0.00465 on Flex), which quietly turns embedding-model selection into a pricing decision: the same corpus embedded at 1,536 dimensions costs exactly twice as much to store as at 768.
Related billing units
- Credit-Based BillingA billing unit where customers pre-purchase or are allocated a pool of credits that deplete as they use the product, often at variable rates per feature.
- Token-Based PricingA billing unit common in LLM and AI products, where customers are charged per input and output token processed.
- Per-Seat PricingA billing unit where the vendor charges a fixed fee per named user, regardless of how much each user consumes.
- Per-Resolution PricingA billing unit unique to AI customer-support products, where the vendor charges only when an AI agent resolves a customer issue without escalation.
- Bandwidth-Based PricingA billing unit where customers are charged per gigabyte of data transferred out of the platform.
- Per-Function-Invocation PricingA billing unit where customers are charged per serverless function invocation, often combined with a separate compute-time charge.
- CPU-Hour PricingA billing unit where customers are charged for the CPU time their workloads consume, typically measured in vCPU-seconds or vCPU-hours.
- GB-Hour PricingA billing unit where customers are charged for the memory their workloads consume over time, measured in gigabyte-hours.
- GPU-Hour PricingA billing unit where customers are charged for GPU time consumed, typically measured per-second or per-hour by GPU type.
- Per-API-Call PricingA billing unit where customers are charged per API request, regardless of payload size or processing time.
- Per-GB Storage PricingA billing unit where customers are charged per gigabyte of data stored on the platform per month.
- Media-Minute PricingA billing unit where customers are charged per minute of audio or video processed — used by speech, voice, and video AI vendors.
- Per-Request PricingA billing unit where customers are charged per request served — the generic meter for inference endpoints, search, scraping, and browser infrastructure.
- Per-Event PricingA billing unit where customers are charged per event ingested — the native meter of observability and billing-infrastructure platforms.
- Per-Character PricingA billing unit where customers are charged per character of text processed — the standard meter for text-to-speech and translation.
- Per-Document PricingA billing unit where customers are charged per document processed or generated — common in AI writing, SEO, and document-intelligence tools.
- Per-Page PricingA billing unit where customers are charged per page crawled, parsed, or rendered — the meter for web scraping and document parsing.
- Per-Transaction PricingA billing unit where customers are charged per financial or billing transaction processed — the meter of billing and accounting platforms.
- Active-User PricingA billing unit where customers are charged per monthly or daily active user rather than per provisioned seat.
- Per-Task PricingA billing unit where customers are charged per task an automation or agent executes — Zapier's historical unit, now spreading to AI agents.