Data Pipeline Pricing: Examples & Companies

25 companies in the corpus Updated full analysis
Definition

Data Pipeline Pricing is Pricing for data collection, scraping, and pipeline services — platforms that extract, transform, and deliver web data, typically billed per request, per GB, or per record.

Also known as: Web Scraping PricingData Collection Pricing

What is it

Data Pipeline Pricing is pricing for data collection, scraping, and pipeline services — platforms that extract, transform, and deliver web data, typically billed per request, per GB, or per record.

Twenty-five companies in the UsagePricing corpus tag data-pipeline as a use case, making it one of the densest single clusters in the collection. The group spans pure proxy-and-scraping vendors (Bright Data, Oxylabs, ScraperAPI, ZenRows), AI-agent-facing extraction APIs (Firecrawl, Exa, Tavily, Linkup), search-results APIs (SerpApi), no-code scrapers (Browse AI, Apify), structured-extraction and knowledge-graph platforms (Diffbot), workflow-automation engines (n8n), GTM enrichment (Clay), and the compute-and-storage substrate the heaviest pipelines run on (Modal, RunPod, Anyscale, Lightning AI, Upstash, Turbopuffer).

What unites them is a near-total commitment to usage-based metering. Almost none of these companies charge per seat. Instead, each picks the unit that tracks where its cost — and the customer’s value — actually sits: a gigabyte of proxy bandwidth, a successful scrape, a delivered record, a workflow execution, or a search that returned results. The category is also notable for outcome-aligned billing: the leading scrapers charge only when a request succeeds, eating the cost of blocks and CAPTCHAs themselves.

The cluster has recently consolidated through acquisitions (ScraperAPI acquiring Traject Data), and it sits at the supply end of the AI stack — these are the companies that feed grounding data, enrichment, and training corpora to everything else. That makes their pricing a useful early read on how the broader market meters raw, variable-cost work. For the underlying mechanics, see the introduction to usage-based pricing.


How it works

Data pipeline pricing resolves to one core decision: which unit best tracks the cost of doing the work. The corpus uses four primary meters, and the larger platforms run several in parallel rather than forcing one unit across an entire catalog.

Billing unitWhat it tracksBest fitExample on this page
Per request / per 1,000 requestsCount of API callsSearch and structured extractionExa Search at $7/1k requests (~$0.007/call)
Per GBBandwidth transferredRaw proxy traffic where payload size drives costOxylabs residential from $6/GB, datacenter from $0.59/GB
Per record / per 1,000 resultsRows delivered or successful resultsPackaged datasets and success-based scrapingBright Data datasets per 1,000 records
Credits (difficulty-weighted)Normalized request difficultyScrapers where some pages cost far more to fetchScraperAPI 1 credit (plain) → 75 credits (ultra-premium + render)

Two structural levers recur across the category. The first is success-based billing: Bright Data, Oxylabs, SerpApi, and ZenRows do not charge for requests that fail with a block, CAPTCHA, or 5xx/6xx system error. The vendor absorbs proxy churn; the buyer pays only for data actually returned. The second is the difficulty multiplier: ScraperAPI’s credit costs 1 for a plain page but 10 for JavaScript rendering or premium proxies and 75 for ultra-premium-plus-render, while Oxylabs varies its per-1,000-result rate by target (Amazon $0.50, Google $1.00, other $1.15 without JS rendering). Both encode the reality that not all requests cost the same to serve.

A worked example shows how the meter choice changes the math. Take 100,000 page fetches in a month:

Unit math (per-credit, difficulty-weighted): On ScraperAPI’s 100,000-credit Hobby plan ($49/mo), 100,000 plain pages cost 100,000 credits — exactly the plan. But if every page needs JS rendering (10 credits each), the same 100,000 fetches need 1,000,000 credits, forcing a far larger tier. The headline credit count overstates real capacity the moment multipliers apply.

Unit math (per-credit, page-flat): On Firecrawl, one credit ≈ one page, so 100,000 pages ≈ 100,000 credits — covered by the $83/mo Standard tier (100,000 credits). Because Firecrawl charges no per-seat fee, a 50-person team pays the same $83 as a solo developer at that volume.

The third axis, easy to miss, is throughput as a separate price dimension. ScraperAPI caps concurrent threads (20 → 500) independently of the credit budget, and Firecrawl raises concurrent browsers and rate limits with each tier — so two customers with identical volume allowances can have very different speed ceilings. Choosing the right unit and tier requires matching all three: volume, difficulty mix, and concurrency. The choosing-the-right-usage-metric guide walks through that selection.


Companies using this

These 25 corpus companies tag data pipelines as a primary use case. The table below lists each with its product, pricing model, billing units, free-tier status, and last-verified date — sortable and filterable.


Patterns observed

Multi-meter catalogs are the norm at the top of the market. The largest proxy-and-scraping vendors deliberately run several billing units at once. Bright Data is the cleanest example in the entire corpus: rotating residential and its Browser API bill per GB, static ISP and datacenter proxies bill per dedicated IP, the unblocker/SERP/scraper APIs bill per 1,000 successful results, and datasets bill per 1,000 records — four simultaneous meters on one platform. Oxylabs mirrors the structure with three value metrics (GB, IP, successful results) across seven product lines. The upside is that margins stay legible; the downside is that cross-product cost forecasting becomes genuinely hard for buyers.

Success-based billing is a category differentiator, not an edge case. Bright Data, Oxylabs, SerpApi, and ZenRows all decline to charge for failed requests — blocks, CAPTCHAs, and system errors are on the vendor. SerpApi takes the posture furthest: it has no overage line at all, so exhausting a plan triggers an early full-price renewal rather than a marginal per-search charge. This outcome alignment is the same logic driving the broader shift toward outcome-based pricing in AI, arriving early here because block rates make “pay per attempt” feel unfair.

Difficulty-weighting normalizes the cost of uneven work. Because a JavaScript-heavy, bot-protected page costs dramatically more to fetch than a static one, several vendors price the difficulty rather than the raw count. ScraperAPI’s credit multiplier (1× / 10× / 75×) and Oxylabs’ target-specific per-1,000 rates both do this. The pattern keeps unit economics honest but makes headline allowances misleading — a 100,000-credit plan is not 100,000 hard scrapes.

Seats are nearly absent. This is among the least seat-based clusters in the corpus. Firecrawl explicitly never charges per user; Exa, Tavily, and Linkup are pure pay-as-you-go credit balances with no seat line at all. The buyer is a developer or an automated agent, not a team of named human users, so the value metric is throughput, not headcount.

Falling unit prices, especially on proxies. Oxylabs’ residential entry rate roughly halved from $12/GB in 2022 to $6/GB in 2026, and Apify is the rare metered platform that has cut prices — Scale from $499 to $199, Starter from $49 to $29, and compute-unit rates ~20–25% in 2025. Search APIs are mixed: Exa’s base Search rate fell to $5/1k in 2025 then rose to $7/1k in 2026.

Compute substrate companies appear on the supply side. Modal, RunPod, Anyscale, Lightning AI, Upstash, and Turbopuffer tag data-pipeline not because they scrape, but because heavy pipelines run their extraction, embedding, and storage jobs on these platforms — metered in GPU-hours, GB-hours, or per-operation, never per record. They show how a single use case spans both the data layer and the compute layer beneath it.


Counterexamples & variants

Sales-led, unpriced data work breaks the self-serve mold. Not every data-pipeline company exposes a meter. micro1 (a human-data engine and RL-environment provider for frontier labs) and Mercor (an AI talent marketplace plus enterprise data partnerships) are both sales-quoted with no public price — Mercor’s buyer take-rate is undisclosed and only the hourly expert pay is visible. When the “pipeline” is bespoke human-labeled data rather than automated web extraction, usage metering gives way to custom enterprise contracts. These are the clearest cases where the category’s default model does not apply.

Subscription-with-credit-wallet, not pure usage. Apify and Diffbot layer a flat monthly plan over a prepaid credit pool — Apify’s prepaid balance equals its plan fee ($29 of Starter buys $29 of usage) and expires at cycle end. This is a hybrid, not pure usage, and it introduces a use-it-or-lose-it dynamic absent from balance-based vendors like Exa or Tavily. The variant is worth flagging because it changes the buyer’s optimization problem from “minimize spend” to “right-size the wallet.” See the prepaid-credits guide for how these pools behave.

Execution-metered automation is a different unit entirely. n8n prices on monthly workflow executions — not requests, GB, or records — because a workflow may scrape, transform, and load in a single billed run. Its free self-hosted Community Edition also breaks the cluster’s freemium-SaaS pattern. Clay splits its meter again into an Actions capacity tier plus a separate Data Credits usage pool. Both show that “data pipeline” at the orchestration layer meters the job, not the byte.

KYC and legal gating inside self-serve. Bright Data requires a know-your-customer review (sometimes a video call) before residential and mobile networks switch on — a manual compliance checkpoint inside an otherwise instant, card-on-file product. SerpApi goes the other way and productizes the legal risk, bundling an up-to-$2M U.S. Legal Shield rather than charging for it. Both are reminders that in web data, compliance is a pricing-adjacent variable, not just a footnote.


What this means for buyers vs vendors

For buyers

Model your bill on your difficulty mix, not the headline allowance. A 100,000-credit plan on ScraperAPI is 100,000 plain pages but only ~1,333 ultra-premium-plus-render scrapes; on Oxylabs the per-1,000-result rate changes with the target. Estimate the share of your traffic that needs JS rendering or premium proxies before picking a tier, and test-run heavy jobs — Apify’s compute-unit cost is impossible to forecast without one.

Prefer success-based vendors for hard targets. If you scrape bot-protected sites, Bright Data, Oxylabs, SerpApi, and ZenRows won’t charge you for the blocks — which can be a large fraction of attempts. Watch the meter that dominates: on protected targets, residential-proxy GB can dwarf the plan fee, the single biggest source of bill shock in this category. The pricing calculator helps you sanity-check tier choices against expected volume.

Check the wallet rules. Apify and Firecrawl credits don’t roll over (except via auto-recharge or annual), so over-provisioning is pure waste; balance-based vendors like Exa and Linkup let unused credit sit. If your volume is spiky, a pay-as-you-go balance beats a fixed monthly pool.

For vendors

Match the meter to the cost driver, then keep it legible. The corpus winners run multiple meters (Bright Data’s GB / IP / results / records) precisely because no single unit fits a proxy, a scrape, and a dataset — but every additional meter raises the buyer’s forecasting burden. The trade is margin clarity for budgeting friction; price the dimensions that genuinely diverge in cost and bundle the rest.

Success-based billing is a trust lever worth the cost. Eating failed requests differentiates you from resellers and removes the buyer’s biggest objection on hard targets. Pair it with difficulty-weighting so you don’t lose money on the expensive scrapes — that’s the ScraperAPI and Oxylabs playbook. And resist drift toward per-attempt billing; in web data it reads as charging for your own failures.

Use throughput and concurrency as a clean up-sell. ScraperAPI (20 → 500 threads) and Firecrawl (concurrency per tier) both sell speed as a second axis independent of volume, capturing willingness-to-pay from latency-sensitive accounts without raising the per-unit rate. For the mechanics of metering and invoicing this volume of events, see the usage-invoicing and billing-cycles guide.

Company Product Pricing modelBilling unitsFree tier Verified
AnyscaleManaged Ray platform for distributed AI training, inference, and batch processing (RayTurbo, Anyscale Compute Units)
pure-usagecommitmenthybrid
gpu-hourscpu-hourscredits
Yes2026-05-29
ApifyApify Platform — web scraping and browser-automation cloud with an Actors marketplace
hybridfreemium
gb-hourscreditsbandwidth-gb+2
Yes2026-06-03
Bright DataWeb data platform — proxy networks, scraping APIs, a managed scraping browser, SERP and unlocker APIs, ready-made datasets, and eCommerce insights
pure-usagehybridcommitment+1
bandwidth-gbrequestsrecords+1
Yes2026-06-04
Browse AINo-code web scraping and website-monitoring platform that turns any site into a structured dataset or API
freemiumhybridcommitment
creditsseats
Yes2026-06-04
ClayAI-powered GTM data-enrichment and outbound platform billed on Actions plus Data Credits
hybridfreemiumcommitment
creditsactions
Yes2026-06-02
DiffbotWeb-extraction APIs (Extract, Crawl, Natural Language) plus a Knowledge Graph, metered on monthly credits
hybridfreemium
creditsapi-calls
Yes2026-06-04
ExaAI web search API for agents — search, contents, deep research, and monitoring endpoints billed per request
pure-usagefreemium
requestscreditsapi-calls+1
Yes2026-06-01
FirecrawlWeb-scraping and data-extraction API for AI agents — scrape, crawl, map, search, and extract pages into clean markdown/JSON
subscriptionhybridfreemium
creditspages-renderedapi-calls+1
Yes2026-06-02
Lightning AICloud GPU/CPU Studio compute platform for building, training, and serving AI models, billed by the second with a credit pool.
hybridfreemiumpure-usage
gpu-hourscpu-hourscredits+3
Yes2026-06-02
LinkupWeb search API for AI agents — Search, Fetch, and async Research endpoints with grounded, structured results
pure-usagefreemium
requestscreditsapi-calls
Yes2026-06-04
MercorAI talent marketplace + enterprise data partnerships for frontier AI labs
pure-usage
tasks
No2026-06-08
micro1Human-data engine, RL environments, and agent evaluation for frontier AI labs
pure-usage
tasks
No2026-06-08
ModalServerless compute and GPU platform — per-second billing for Python functions, batch jobs, and model serving
pure-usagefreemiumsubscription+1
gpu-hourscpu-hoursgb-hours+2
Yes2026-05-29
n8nFair-code workflow automation platform for technical teams, billed by monthly workflow executions
subscriptionfreemium
workflow-executions
Yes2026-06-02
OpenMeterOpen-source usage metering and billing platform for AI, agentic, and developer tools
freemium
eventsapi-calls
Yes2026-06-03
OxylabsWeb data collection: residential, datacenter, ISP & mobile proxies plus Web Scraper API and Web Unblocker
hybridpure-usagefreemium
bandwidth-gbipsrecords+1
Yes2026-06-04
RowsRows AI spreadsheet
subscriptionhybrid
seatstasksapi-calls
Yes2026-06-08
RunPodGPU cloud marketplace — Secure Cloud and Community Cloud Pods, Serverless endpoints, and persistent storage
pure-usagehybridcommitment
gpu-hoursstorage-gb
No2026-05-30
ScraperAPIWeb scraping API that handles proxies, browsers, and CAPTCHAs behind a single endpoint
subscriptionpure-usage
creditsrequestsapi-calls
No2026-06-04
SerpApiReal-time search-results API (Google, Bing, and other engines)
subscriptionpure-usage
api-callsrequests
Yes2026-06-04
TavilyTavily Search API
pure-usagefreemium
creditsapi-callsrequests
Yes2026-06-03
Together AIAI Acceleration Cloud — serverless inference, dedicated endpoints, GPU clusters, Code Sandbox, fine-tuning
pure-usagehybridcommitment
tokensgpu-hourscpu-hours+1
Yes2026-05-29
turbopufferServerless vector and full-text search database on object storage
pure-usagecommitment
storage-gbvectors-indexedgb-hours+1
No2026-06-04
UpstashUpstash (Redis, Vector, QStash, Search, Workflow)
pure-usagefreemiumhybrid
requestsapi-callsvectors-indexed+3
Yes2026-06-03
ZenRowsUniversal Scraper API, Scraping Browser, and Residential Proxies
hybridsubscriptionpure-usage
requestsapi-callsbandwidth-gb+2
Yes2026-06-04

FAQ

What is data pipeline pricing?

Data pipeline pricing is how vendors that extract, transform, and deliver web data charge for it — typically metered per request, per gigabyte transferred, per record returned, or per credit. The unit is chosen to track the actual cost driver of the workload, which is why a single platform like Bright Data may run four different meters at once.

Why do web scraping vendors bill only for successful results?

Blocks, CAPTCHAs, and proxy churn are the vendor's problem to solve, not the buyer's. Success-based billing — used by Bright Data, Oxylabs, SerpApi, and ZenRows — means a failed fetch isn't charged, aligning the vendor's incentive with the customer's and differentiating against resellers who bill all traffic regardless of block rate.

What's the difference between per-request, per-GB, and per-record billing?

Per-request (Exa, SerpApi, ScraperAPI) charges by call and suits search and structured extraction; per-GB (Bright Data, Oxylabs residential proxies) charges by bandwidth and suits raw proxy traffic where payload size drives cost; per-record (Bright Data datasets) charges by row delivered and suits packaged datasets. Many platforms mix all three across product lines.

Are web data prices rising or falling?

Residential proxy unit prices have fallen sharply — Oxylabs' residential entry rate roughly halved from $12/GB in 2022 to $6/GB in 2026. Search API rates are mixed: Exa's base Search rate dropped to $5/1k in 2025 before rising to $7/1k in 2026. Re-baseline your cost model at least twice a year.

Do data pipeline vendors charge per seat?

Rarely. The category is overwhelmingly usage-metered, not seat-based. Firecrawl explicitly never charges per user — a solo developer and a 50-person team pay the same as long as page volume and concurrency match — which makes the tools cheap to roll out org-wide.

Trivia

  • Oxylabs charges 10x more per gigabyte for residential proxies than datacenter: its cheapest residential plan lands at $6/GB while datacenter pay-per-traffic runs $0.59/GB — proof that IP type, not just volume, drives the bill in this category.

  • ScraperAPI prices on a difficulty multiplier rather than flat requests — a plain page costs 1 credit, JS rendering or premium proxies cost 10, and ultra-premium-plus-render costs 75 credits, so a 100,000-credit plan can buy anywhere from 1,333 to 100,000 actual scrapes.

  • SerpApi publishes its entire price ladder up to a $106,050/month "Cloud 54M" tier instead of hiding high volumes behind "contact sales" — and like Bright Data, Oxylabs, and ZenRows, it only bills searches that succeed, eating the cost of blocks and CAPTCHAs itself.

See all pricing trivia

Related use cases

Back to companies