Per-Request Pricing: Examples & Companies

40 companies in the corpus Updated full analysis
Definition

Per-Request Pricing is a billing unit where customers are charged per request served — the generic meter for inference endpoints, search, scraping, and browser infrastructure.

Also known as: Request-Based BillingPer-1,000 Requests Pricing

What is it

Per-Request Pricing is a billing unit where customers are charged per request served — the generic meter for inference endpoints, search, scraping, and browser infrastructure. Whatever the product does — answer a search, scrape a page, run a model, write a memory — the request is the countable thing that crosses the API boundary, and the bill is request volume times a published rate.

The purest expression in the corpus was MultiOn, whose Agent API priced one web action as one request: $0.08 at public beta, cut to $0.04 (Basic) and $0.025 (Premium, behind a $50,000/year minimum) within weeks, with a $0.01 Retrieve rate for extraction-only calls. Linkup is the living equivalent — a web search API with no seats and no tiers, just $0.005 per standard search and $0.05 per deep search, plus an x402 mode that bills a flat $0.01 per request in USDC with no account at all.

Because the per-request price is almost always sub-cent, vendors quote per 1,000 to stay legible: You.com lists its Search API at $5.00 per 1,000 calls, Browserbase meters Search overage at $7 per 1,000 requests, Bright Data prices Web Unlocker and SERP requests from $1.50 down to $1.00 per 1,000 on committed tiers, and Cohere bills Rerank at $2 per 1,000 queries.

The recurring tension is that requests vary wildly in cost to serve. A cached lookup and an exhaustive research run are both “one request,” so nearly every mature per-request rate card layers something on top — effort modes, feature multipliers, request-size bands, or success-only metering. How those layers work is the real story of this unit.

How it works

The base formula is bill = requests × rate. The design work is in the levers vendors stack on it:

LeverWhat it doesExample from the corpus
Per-1k quotingMakes sub-cent rates readableYou.com Search $5.00/1k calls; Twelve Labs search $4/1k queries
Effort / mode bandsPrices the depth of work behind one requestYou.com Research $12 → $450/1k by effort; Linkup deep search 10x standard
Feature multipliersOne meter, scaled by request weightZenRows 5x JS rendering, 10x premium proxies, 25x both; ScraperAPI 1x–75x credits
Success-only meteringBills delivered results, not attemptsSerpApi excludes blocked/CAPTCHA’d searches; Oxylabs doesn’t bill 5xx/6xx
Request quotas (no rate)Tiers gated by monthly request bandsMem0 10K → 500K add requests; Helicone 10K free requests/mo
Volume / commit discountsRate falls with committed spendBright Data $1.50 → $1.00/1k; SerpApi $7.50 → $2.75/1k reserved

Worked example — agent search budget. An agent product on Linkup runs 100,000 standard searches a month at $0.005 each: $500. Upgrade every call to sourced-answer output at $0.006 and the same volume costs $600. Route 5% of traffic (5,000 calls) to deep search at $0.05 and that slice alone adds $250 — half the original budget for one-twentieth of the volume. The formula never changed; the per-request rate did, by mode.

Worked example — multiplier math. On ZenRows, the Universal Scraper API meters cost per 1,000 successful requests with multipliers: 5x for JavaScript rendering, 10x for premium proxies, 25x for both. A 50,000-request job against a protected, JS-heavy target therefore consumes the balance of a 1,250,000-plain-request job. Failed and retried calls don’t draw down the balance at all — and HTTP 404/410 responses count as successful completions, a detail worth reading twice before pointing a crawler at dead links.

Worked example — quota ladder instead of a rate. Mem0 never publishes a per-request price. Instead, each tier carries two request quotas: add requests scale 10,000 → 50,000 → 200,000 → 500,000 and retrieval requests 1,000 → 5,000 → 20,000 → 50,000 across Hobby (free), Starter ($19), Growth ($79), and Pro ($249). The implied unit price falls as you climb — Pro works out to roughly $0.0005 per add request — but the lever buyers actually pull is the tier, not the meter. Helicone (10,000 free requests/month, then usage-based overage) and Portkey ($49/month for 100,000 logged requests, then $9 per additional 100K) run the same quota-first pattern. For how these meters get counted in the first place, see the tracking and metering usage events guide.

Companies using this

37 in-corpus companies meter requests, making it one of the most widely shared billing units in the corpus. The cluster spans five categories — search APIs (SerpApi, Tavily, Linkup, You.com), web data and scraping (Bright Data, Oxylabs, ZenRows, ScraperAPI), browser and agent infrastructure (Browserbase, MultiOn), LLM observability and memory (Helicone, Portkey, PromptLayer, Mem0), and inference platforms where requests ride alongside tokens (Cohere, Baseten, Groq, Perplexity).

Patterns observed

  • Per-1k quoting is the house style. You.com ($5.00/1k Search calls), Browserbase ($7/1k Search, $1/1k Fetch, $4–$7/1k Extract), Bright Data ($1.50/1k Web Unlocker), Cohere ($2/1k Rerank queries), and Twelve Labs ($4/1k search queries) all normalize sub-cent request prices into per-1,000 figures. The meter is per-request; only the display unit is scaled.

  • Effort bands absorb cost variance without abandoning the unit. You.com prices Research requests at $12 / $50 / $100 / $450 per 1,000 by effort tier, Linkup charges 10x for deep search over standard, and SerpApi prices enterprise overage by speed mode ($7.50 / $15 / $30 per 1,000 on-demand). The request stays the unit; the band prices what’s behind it.

  • Multiplier systems are the scraping category’s answer to the same problem. ZenRows (5x/10x/25x by rendering and proxy class) and ScraperAPI (1 credit standard, 10 with JS rendering or premium proxies, 25 premium+render, 75 ultra-premium+render) charge heavy requests more while keeping one balance. Qodo applies the identical idea to model choice: most LLM requests cost 1 credit, Claude Opus costs 5.

  • Success-only metering is becoming table stakes where failure is common. SerpApi bills only fully successful searches, Oxylabs doesn’t charge for 5xx/6xx scraper attempts, and ZenRows draws balance only on successful results. In scraping, where block rates are a fact of life, charging per delivered result rather than per attempt is now a competitive requirement.

  • Quota gates outnumber published rates. Many vendors use requests as tier boundaries rather than priced units: Mem0’s add/retrieval quotas, Helicone’s 10K free requests, Portkey’s 100K-log allotment, OpenRouter’s 1M free BYOK requests per month (then a 5% fee), and GitHub Copilot’s premium-request allowances that became AI Credits at $0.01 each. The request is the meter even when no per-request price appears on the page.

  • Read and write requests are splitting into separate meters. Mem0 prices add requests and retrieval requests on independent quota ladders, Pinecone separates read units from write units entirely, and turbopuffer prices writes per GB and queries per data scanned. Once a platform’s read and write costs diverge, a single undifferentiated request meter stops working.

Counterexamples & variants

The cautionary tale is MultiOn. Its Agent API defined one request as roughly one action taken on a webpage — an honest unit, but one whose unit economics never stabilized: the price halved from $0.08 to $0.04 within about five weeks of public beta, a Premium tier needed a $50,000/year minimum to make the lower $0.025 rate viable, and an extraction-only Retrieve endpoint had to be split out at $0.01 because lumping it with full agent actions overpriced it 4x. MultiOn pivoted to consumer in December 2024 and the API wound down — a reminder that per-request pricing only works when the vendor actually knows what a request costs to serve.

The second counterexample is the databases that outgrew the unit. Pinecone nominally lists requests as a meter, but its real units are read units and write units: a query consumes roughly 1 RU per GB of namespace size (minimum 0.25 RU), and a write consumes 1 WU per KB (minimum 5 WU per request). Two identical API calls against differently-sized indexes cost wildly different amounts — the “request” survives only as a minimum-charge floor. turbopuffer made the same move explicit by pricing queries per unit of data scanned, then cutting that base rate from $5 to $1 per petabyte in February 2026. When request cost scales with state size rather than call count, vendors migrate to request-size pricing and the flat per-request model quietly dies.

The third group is the inference platforms — OpenAI, Groq, Fireworks AI, Google, Baseten, DeepInfra, Replicate — which record requests but run their economics on tokens or compute seconds; the request count is incidental, and those vendors belong to the token-based pricing story. The interesting hybrid is Perplexity AI’s Sonar API, which charges per-token rates plus a per-1,000-request search fee ($5–$12 depending on search-context depth) — the request fee prices the retrieval work that tokens can’t see. And Linkup’s x402 mode is the unit taken to its logical extreme: a flat $0.01 per request paid in USDC on Base, no account, no invoice — per-request pricing as a wire protocol.

What this means for buyers vs vendors

For buyers

Start by finding out what one request actually costs for your workload, not the headline rate. The published number is usually the cheapest request on the card: on ZenRows a protected JS-heavy target costs 25x the plain rate, on You.com an exhaustive Research call costs 37x a lite one, and on Pinecone the same query gets more expensive as your index grows. Ask three questions in procurement: which request types and multipliers will my traffic actually hit; are failed requests billed (on SerpApi and Oxylabs they aren’t — on most APIs they are); and is there a budget cap or alert, like Upstash’s hard monthly ceiling with alerts at 70% and 90%. Then model your real mix with the pricing calculator before committing to a tier — quota-gated plans like Mem0’s punish misestimating which of the two request meters (reads or writes) you’ll exhaust first.

For vendors

Per-request pricing earns you the fastest quote-to-forecast path in usage-based pricing — a developer can budget in seconds — but only if your cost to serve a request is roughly uniform. The corpus shows three proven ways to keep the unit when it isn’t: effort bands (You.com, Linkup), feature multipliers on a single balance (ZenRows, ScraperAPI), and read/write meter splits (Mem0, Pinecone). If you’re in a category with real failure rates, bill successful requests only — SerpApi and Oxylabs have made that the trust baseline in web data, and charging for blocked attempts now reads as a red flag. Price the unit before you launch it, not after: MultiOn repricing twice in five weeks is what discovering your serving cost in production looks like. The choosing the right usage metric guide covers when the request is the right value metric at all, and tracking and metering usage events covers the pipeline you’ll need to count it defensibly.

Company Product Pricing modelBilling unitsFree tier Verified
BasetenML inference infrastructure — dedicated GPU deployments, Model APIs, and Truss frameworkYes2026-05-29
Bright DataWeb data platform — proxy networks, scraping APIs, a managed scraping browser, SERP and unlocker APIs, ready-made datasets, and eCommerce insightsYes2026-06-04
BrowserbaseBrowser-agent infrastructure: headless browser sessions, web Search/Fetch APIs, agent identity, runtime, and a model gateway behind one API keyYes2026-06-02
CartesiaReal-time voice AI platform (Sonic TTS, voice cloning, voice agents)Yes2026-05-29
ClipdropAI image-editing and generation tools (background removal, upscaling, text-to-image), now part of JasperYes2026-06-05
CohereCommand, Embed, Rerank APIsYes2026-05-29
DeepInfraServerless inference cloud — per-token LLM/embedding APIs, per-image and per-minute media models, per-hour on-demand GPU containers, and reserved DeepCluster GPU clustersNo2026-06-02
ExaAI web search API for agents — search, contents, deep research, and monitoring endpoints billed per requestYes2026-06-01
FalGenerative-media inference platform — serverless per-output model APIs plus dedicated GPU computeNo2026-06-01
Fireworks AIGenerative AI inference platform — serverless per-token, on-demand GPU, fine-tuning, batch APIYes2026-05-30
GitHub CopilotAI pair programmer and coding agent embedded in GitHub, VS Code, and most major IDEs.Yes2026-06-02
GladiaSpeech-to-text & audio intelligence APIYes2026-06-09
GoogleGemini API & AI StudioYes2026-05-29
GroqGroqCloud — LPU-based ultra-low-latency inference API for Llama, GPT-OSS, Qwen, Whisper, and MixtralYes2026-05-29
HeliconeOpen-source LLM observability & AI gatewayYes2026-06-09
Jina AISearch Foundation API (Embeddings, Reranker, Reader, DeepSearch, Classifier)Yes2026-06-03
LinkupWeb search API for AI agents — Search, Fetch, and async Research endpoints with grounded, structured resultsYes2026-06-04
Mem0Memory layer for AI agents and applicationsYes2026-06-10
MultiOnAutonomous web-browsing AI agent API (wound down)No2026-06-10
OpenAIChatGPT consumer subscriptions + GPT-5.x API with token-based usage billingYes2026-05-30
OpenRouterMulti-model LLM API routing marketplaceYes2026-06-10
OxylabsWeb data collection: residential, datacenter, ISP & mobile proxies plus Web Scraper API and Web UnblockerYes2026-06-04
Perplexity AIAI-native answer engine with citations and multi-model searchYes2026-05-29
PhindAI developer search engine and coding assistant (shut down January 2026)Yes2026-06-08
PineconeManaged vector database (serverless)Yes2026-06-09
PortkeyAI gateway & LLMOps governance platformYes2026-06-10
PromptLayerPrompt management, evaluation, and observability platform for LLM and AI-agent teamsYes2026-06-04
QodoQodo (formerly Codium AI) — AI code integrity platform: Qodo Gen (IDE plugin), Qodo Merge (PR review agent), and Qodo Command (CLI / agentic quality workflows)Yes2026-06-03
Reka AINatively multimodal models (Spark, Edge, Flash, Core) + Research & Vision APIsYes2026-06-11
ReplicateCloud platform for running, fine-tuning, and deploying AI models via REST APIYes2026-05-30
ScraperAPIWeb scraping API that handles proxies, browsers, and CAPTCHAs behind a single endpointNo2026-06-04
SerpApiReal-time search-results API (Google, Bing, and other engines)Yes2026-06-04
TavilyTavily Search APIYes2026-06-03
turbopufferServerless vector and full-text search database on object storageNo2026-06-04
Twelve LabsVideo understanding foundation models (Marengo for search/embeddings, Pegasus for analysis) delivered as a usage-metered APIYes2026-06-02
UpstashUpstash (Redis, Vector, QStash, Search, Workflow)Yes2026-06-03
VectaraEnterprise RAG-as-a-Service and agent platform for trusted, grounded, auditable AINo2026-06-02
WritesonicGEO / AI-search-visibility and SEO platform that tracks brand mentions across AI answer engines and ships content/citation fixesYes2026-06-07
You.comWeb search, contents, research, and finance-research APIs for AI systemsYes2026-06-01
ZenRowsUniversal Scraper API, Scraping Browser, and Residential ProxiesYes2026-06-04

FAQ

What is per-request pricing?

Per-request pricing is a billing unit where the customer is charged for each request served by the platform — an inference call, a search, a scrape, or a browser action. Rates are usually quoted per 1,000 requests because the per-request price is often a fraction of a cent.

How is per-request pricing different from token-based pricing?

Token pricing meters the volume of text processed inside a request, so two calls can cost very different amounts. Per-request pricing charges per call regardless of payload, which is easier to forecast but a looser fit to the vendor's serving cost — which is why many vendors add request-size bands or effort modes on top.

Why do vendors quote per 1,000 requests instead of per request?

Because single-request prices are usually sub-cent. Quoting '$7 per 1,000 requests' (Browserbase Search) or '$2 per 1,000 queries' (Cohere Rerank) keeps the rate card readable while the meter underneath stays strictly per-request.

Which companies use per-request pricing?

It spans search APIs (Linkup, Tavily, You.com, SerpApi), scraping and web data (Bright Data, Oxylabs, ZenRows, ScraperAPI), browser and agent infrastructure (Browserbase, MultiOn), AI memory and observability (Mem0, Helicone, Portkey, PromptLayer), and inference platforms (Cohere, Perplexity, Baseten). 37 in-corpus companies list requests as a billing unit.

Do vendors charge for failed requests?

Increasingly not, in categories where failure is common. SerpApi excludes blocked, errored, and CAPTCHA'd searches; ZenRows bills only successful results; Oxylabs' Web Scraper API doesn't charge for 5xx/6xx attempts. Most inference and search APIs, by contrast, bill every request that reaches the endpoint.

What are request multipliers?

A multiplier charges more request-units for heavier work while keeping one meter. ZenRows charges 5x for JavaScript rendering, 10x for premium proxies, and 25x for both; ScraperAPI's credits run 1x to 75x per request by feature; Qodo charges 1 credit for most LLM requests but 5 for Claude Opus.

Trivia

  • MultiOn — the purest per-request business in the corpus — halved its Agent API price from $0.08 to $0.04 per request within about five weeks of public beta in 2024, then wound the API down entirely after pivoting to consumer in December 2024.

  • You.com's Research API prices the same nominal "request" at $12 (lite), $50 (standard), $100 (deep), or $450 (exhaustive) per 1,000 calls — and a contact-sales Frontier tier listed above $2,000 per 1,000, a 160x+ spread on one endpoint.

  • Mem0 renamed its billing meters from "memories" to "add requests" and "retrieval requests" in 2026 — a read/write split where every tier carries two separate request quotas (e.g. Pro: 500,000 adds but only 50,000 retrievals per month).

See all pricing trivia

Related billing units

Related guides & calculators

Back to companies