All companies
technology

Fal pricing

fal.ai facts checked analysis reviewed
Quick summary
Pricing model
Region
Product
Generative-media inference platform — serverless per-output model APIs plus dedicated GPU compute
Industry
technology
Commits
None
In this page
AI Summary
  • fal (fal.ai) is a generative-media inference platform that prices purely on usage: serverless per-output model APIs plus dedicated GPU compute, with no seats, no subscriptions, and no free tier on its public pricing page.
  • fal's GPU compute fleet is billed per hour and per second: H100 80GB at $1.89/hr ($0.0005/s), H200 141GB at $2.10/hr ($0.0006/s), A100 40GB at $0.99/hr ($0.0003/s), and B200 184GB at 'contact us'.
  • fal model APIs are billed by output unit — video models per second or per video (Wan 2.5 at $0.05/s, Kling 2.5 Turbo Pro at $0.07/s, Veo 3 at $0.4/s, Ovi at $0.2/video) and image models per image or per megapixel (Seedream V4 $0.03/image, Flux Kontext Pro $0.04/image, Qwen $0.02/MP).
  • fal's enterprise tier adds private model hosting, custom inference and training kernels, foundational model research, SOC2 certification, SSO, and dedicated serverless infrastructure — all quoted via sales.
  • fal positions itself as cost-efficient inference infrastructure for other AI products, powering generative media for customers like Poe, PlayAI, and Genspark.
Pricing summary
fal 2026 — pure-usage generative-media inference
No seats, no subscription, no free tier: pay per output for model APIs and per hour/second for GPU compute.
Model APIs — Image
from $0.02 /image
Developers calling image models per output
GPU Compute
from $0.99 /hr
Teams deploying their own apps on a GPU fleet
Enterprise
Contact sales
Fortune-500 and high-volume AI products
Rates are verbatim from fal.ai/pricing (USD). GPU prices are 'starting at' for custom deployments; B200 184GB is 'contact us'.

About

fal (legal entity “features and labels”, at fal.ai) is a generative-media inference platform founded in 2021 by Burkay Gur and Gorkem Yurtseven and headquartered in San Francisco. It runs a GPU fleet optimised for fast, low-latency inference of generative models — image, video, audio, and text-to-speech — and exposes them both as ready-to-call model APIs and as dedicated compute customers can deploy their own apps onto.

fal sells inference wholesale to other AI products as much as to individual developers. Its own marketing cites powering roughly 50% of Poe’s image and video generation, low-latency TTS for PlayAI’s voice agents, and multimodal inference for Genspark’s Super Agent. The company positions on two axes competitors struggle to combine: raw speed (custom inference and training kernels) and cost efficiency (H100 GPUs advertised “from as low as $1.89/hr”).

That positioning has attracted aggressive funding. fal raised a $9M seed (a16z), a $14M Series A (Kindred Ventures), a $49M Series B in February 2025 (Notable Capital + a16z), a $125M Series C in July 2025 at a $1.5B valuation (Meritech), and a $140M Series D in December 2025 at a $4.5B valuation led by Sequoia with NVIDIA’s NVentures participating — roughly tripling its valuation in five months on revenue that grew about 1,040% to ~$285M ARR by the end of 2025. The pricing page rebuilt itself alongside that growth, shifting from raw per-second compute billing to today’s per-output model APIs.

Its closest comparison set is other generative-media inference providers and serverless GPU clouds — Replicate, Runpod, and the model-hosting tiers of the hyperscalers. Unlike seat-priced creative SaaS, fal’s entire public pricing surface is metered: you pay per output for model APIs and per hour or per second for GPU compute. There is no free tier, no subscription, and no per-user fee on the published pricing page.


Pricing summary : How fal’s pure-usage model-inference pricing works

fal uses a pure usage-based pricing model — a self-serve, pay-as-you-go motion built on two metered surfaces, with no seats and no subscription:

  1. Serverless & Compute (GPU rental): Deploy your own app on fal’s GPU fleet, billed per hour and per second. H100 80GB is $1.89/hr ($0.0005/s), H200 141GB is $2.10/hr ($0.0006/s), A100 40GB is $0.99/hr ($0.0003/s), and B200 184GB is “contact us.” Rates are “starting at” for custom deployments.
  2. Model APIs (per output): Call hosted models and pay by output unit, the way individual developers prefer to buy. Video models bill per second or per video (Wan 2.5 $0.05/s, Kling 2.5 Turbo Pro $0.07/s, Veo 3 $0.4/s, Ovi $0.2/video). Image models bill per image or per megapixel (Seedream V4 $0.03/image, Flux Kontext Pro $0.04/image, Nanobanana $0.0398/image, Qwen $0.02/MP).
  3. Enterprise: Private model hosting, custom kernels, dedicated serverless infrastructure, SLA, SOC2, and SSO — all “Contact Sales,” no public price.

What makes this different: fal publishes prices normalised to “output per $1” (e.g. 20 seconds of Wan 2.5 video or 33 Seedream V4 images), turning the pricing page itself into a buyer-facing cost comparator rather than a plan picker.


Pricing by product

GPU compute (Serverless & Compute)

Deploy your own app on fal’s GPU fleet. Rates are “starting at” for custom deployments; contact support@fal.ai to get started.

GPUVRAMPrice per hourPrice per secondKey mechanics
A10040GB$0.99/hr$0.0003/sLowest-cost tier for steady inference
H10080GB$1.89/hr$0.0005/sHeadline “from $1.89/hr” GPU
H200141GB$2.10/hr$0.0006/sLarger VRAM for bigger models
B200184GBcontact uscontact usNewest Blackwell capacity, sales-gated

Model APIs — Video models

Video models are billed by output unit — per second or per video — depending on the model.

ModelUnitPriceOutput per $1Key mechanics
Wan 2.5second$0.0520 secondsCheapest per-second video listed
Kling 2.5 Turbo Prosecond$0.0714 secondsMid-tier per-second video
Veo 3second$0.43 secondsPremium per-second video
Ovivideo$0.25 videosBilled per whole video, not per second

Model APIs — Image models

Image models are billed by image count or by output size in megapixels (MP); fal normalises listed prices to 1MP for comparison.

ModelUnitPriceOutput per $1Key mechanics
Qwenmegapixel$0.0250 megapixelsCheapest, billed per MP
Seedream V4image$0.0333 imagesPer-image billing
Flux Kontext Proimage$0.0425 imagesPer-image billing
Nanobananaimage$0.039825 imagesPer-image billing

Enterprise (separate sales-led tier)

TierPriceIncludedKey mechanics
EnterpriseContact salesPrivate model hosting, custom inference/training kernels, foundational model research, dedicated serverless infra, SLA, SOC2, SSO, user managementSales-led, quoted

Sales motions across products: self-serve / pay-as-you-go for GPU compute and model APIs; sales-led for enterprise (private hosting, custom kernels, SLA, SOC2, SSO) and for B200 capacity.


Hidden costs : What metered inference actually costs at volume

fal’s per-output rates look tiny in isolation, but generative-media workloads multiply them fast. Two representative archetypes:

A consumer AI app generating short videos

A product shipping 50,000 short clips a month (5-second clips at $0.05/s on a Wan-2.5-class model), plus 200,000 preview images at $0.03 each on Seedream V4.

Line itemMonthly cost
50,000 clips × 5s × $0.05/s$12,500
200,000 preview images × $0.03$6,000
Total$18,500

At this volume the per-second video charge dominates — a single model swap from Wan 2.5 ($0.05/s) to Veo 3 ($0.4/s) would multiply the video line roughly 8×, to ~$100,000/mo.

A team self-hosting on dedicated GPUs

A team running two H100s continuously for a custom pipeline rather than calling model APIs.

Line itemMonthly cost
2 × H100 × $1.89/hr × 730 hrs$2,759
Overflow burst: 1 × H200 × $2.10/hr × 100h$210
Total$2,969

Self-hosting trades the convenience of per-output API pricing for flat GPU-hour cost — cheaper at high, steady utilisation, but you pay for idle time. The crossover depends entirely on how busy the GPUs stay.

Want to estimate your own fal bill? Use the fal pricing calculator to model your monthly cost across per-output model-API calls and GPU-hour compute.


Pricing evolution : From GPU rental to normalised per-output model APIs

fal’s pricing model rebuilt itself twice in eighteen months: from raw per-second unit billing (2024) to a per-output model comparator (2025), tracking the company’s pivot from a generic serverless-GPU host into a generative-media inference platform — and four funding rounds that took it from a $49M Series B to a $4.5B valuation.

Cadence

QuarterPrice changesProduct / SKU additionsNotes
2024 Q100Raw per-second unit pricing: CPU $0.00003/s, A100 $0.001/s, A10G $0.0002/s, T4 $0.00009/s; “Choose your Machine Type” picker.
2024 Q2102024-05 — “Choose a budget” comparator UI: GPU rates re-expressed as inferences-per-$20 (SDXL, Whisper); A100 now $0.00111/s, A6000 $0.000575/s.
2025 Q1122025-01 H100 80GB ($0.00125/s) added + “Billing Based on Model Output” table (FLUX.1, SD3, Stable Video); 2025-02 $49M Series B banner + hero redesign.
2025 Q2212025-04 per-hour GPU display (“H100s from $1.99/hr”); 2025-05 H100 cut to $1.89/h, H200 $2.10/h, A100 $0.99/h, A6000 $0.60/h added.
2025 Q3012025-07 modern layout reached (coincident with $125M Series C, $1.5B valuation): “Output-Based Pricing” split into Video/Image Models; B200 “contact us” added.
2026 Q2002026-06-01 — current capture: section labels now “Serverless & Compute Pricing” and “Model APIs Pricing”; H100/H200/A100 rates unchanged since 2025-05.

Tracked range: 2024-02–2026-06-01. Quarters not listed above were verified stable (0 price changes, 0 SKU additions).

Notable changes

  • 2024-02 — Earliest captured pricing: raw per-second unit rates (CPU/Memory/GPU/Storage) with a machine-type picker; no output-based pricing.
  • 2024-05 — Budget-slider comparator introduced; GPU-second rates re-expressed as “inferences per $20” — fal’s first output-normalisation.
  • 2025-01 — H100 added; per-output “Billing Based on Model Output” table launched alongside the GPU comparator; new diamond wordmark.
  • 2025-02 — $49M Series B (Notable Capital + a16z) announced via on-site banner; pricing hero redesigned.
  • 2025-04 — Per-hour GPU pricing introduced, led by “H100s from as low as $1.99/hr.”
  • 2025-05 — H100 cut to $1.89/h; full per-hour fleet (H200 $2.10/h, A100 $0.99/h, A6000 $0.60/h) — the rates still live a year later.
  • 2025-07 — Modern “Output-Based Pricing” layout (Video + Image Models, “Output per $1”); B200 184GB added as sales-gated “contact us.” Coincided with the $125M Series C at a $1.5B valuation.

The 2024-to-2025 repricing in detail

fal’s pricing history is really the story of a positioning change. In 2024 the page sold generic serverless GPU compute — you picked a machine type and paid per unit-second of CPU, memory, and GPU, the same way you’d reason about a cloud VM. The mid-2024 “budget slider” was the first hint of the eventual strategy: it re-expressed those raw GPU-second rates as “how many SDXL images can $20 buy,” teaching buyers to think in outputs rather than seconds.

The decisive break came across 2025 Q1–Q3, bracketed by funding. The $49M Series B (TechCrunch, Sep 2024 seed coverage; Series B announced Feb 2025) financed the pivot to “the future of AI video,” and the pricing page followed: a dedicated “Billing Based on Model Output” table (Jan 2025) introduced true per-output rates, the GPU fleet moved to a per-hour display (Apr 2025) with H100 cut from $1.99 to $1.89/hr (May 2025), and by the $125M Series C ($1.5B valuation, July 2025) the page had split cleanly into GPU rental and per-output Model APIs with the “Output per $1” comparator. fal went on to raise a $140M Series D in December 2025 led by Sequoia at a $4.5B valuation — roughly tripling its Series C mark in five months — on the back of revenue that grew ~1,040% to ~$285M ARR by the end of 2025. The current page is the stable end-state of that repricing.


What’s unique : Output-normalised, seatless inference pricing

1. The pricing page is a cost comparator, not a plan picker. fal publishes every model rate alongside a normalised “output per $1” figure — 20 seconds of Wan 2.5 video, 33 Seedream V4 images, 50 megapixels of Qwen. Buyers compare cost-per-output directly across models instead of decoding tiers.

2. Two billing surfaces, one metered philosophy. fal lets you either call hosted models per output (zero ops, pay per second/image) or rent the underlying GPU per hour/second and deploy your own app. The same usage-based logic spans both — there is no seat or subscription anywhere on the public page.

3. Dual time granularity on compute. GPU compute is quoted both per hour and per second (H100 at $1.89/hr or $0.0005/s), letting bursty workloads reason about cost at the second while steady ones budget by the hour.

4. Sales-gating the frontier. The newest Blackwell B200 (184GB) is the only GPU with no public price — “contact us” — using transparency on commodity GPUs while reserving newest-capacity pricing for negotiation.


Strengths & weaknesses

StrengthsWeaknesses
Fully transparent per-output and per-GPU-hour pricingNo free tier to lower trial friction
”Output per $1” normalisation makes model cost comparison easyPremium models (Veo 3 at $0.4/s) make bills volatile at scale
Per-second granularity aligns cost with actual compute consumedB200 and enterprise pricing hidden behind “contact us”
Two surfaces (managed APIs vs raw GPU) fit different ops appetitesNo published volume-commit discounts on the pricing page
No seats — cost scales with usage, not headcountBill is fully variable; no predictable monthly floor

Billing UX : Named controls on the public pricing surface

  • “Output per $1” comparator column — every model-API row shows normalised output (seconds, images, or megapixels) per dollar so buyers can compare cost-efficiency without doing the math.
  • Per-hour and per-second GPU columns — the Serverless & Compute table prices each GPU at both granularities side by side.
  • “Start Building” vs “Contact Sales” CTAs — each pricing block pairs a self-serve entry point with a sales path, separating PLG from sales-led motions inline.
  • Enterprise Contact Form with company-size selector — the enterprise capture exposes a structured intake (Just me → 2,000+ people) that routes high-volume buyers to sales.
  • “contact us” cells for gated SKUs — B200 capacity and enterprise features render as explicit “contact us” placeholders rather than hidden rows.

Strategic wins : Why fal’s seatless, output-normalised pricing works

1. Pricing the unit the customer actually buys

fal charges per generated output — a second of video, an image, a megapixel — which is exactly the unit a generative-media product produces. This is textbook value-metric alignment: the bill moves with the customer’s own output volume, so cost feels fair and scales with their success rather than their headcount.

2. Turning the price table into a comparator

By normalising every rate to “output per $1,” fal removes the cognitive tax of comparing $0.05/second against $0.03/image. This is a subtle usage-based pricing UX win: the pricing page does the buyer’s cost-modelling for them, lowering the barrier to choosing a model.

3. Two surfaces capture two buyer mindsets

Offering both managed per-output APIs and raw per-hour GPU rental lets fal monetise the convenience-seeker and the cost-optimiser without forcing either into the wrong model — a flexible answer to the AI infrastructure cost question of “rent capacity or pay per result.” It is the same dual-surface bet Replicate and Runpod make, but fal leans harder on the per-output side for generative media.


Areas to improve : Predictability and trial gaps in a fully-metered model

1. No free tier or trial credit

With every line item metered and no free allowance, a curious developer must commit a payment method before generating a single output. A small monthly free-output grant (a few free images/seconds) would lower trial friction the way credit grants do in other usage-based pricing models, without materially denting revenue.

2. Fully variable bills invite bill shock

Because there is no floor and premium models cost 8× the cheapest, a model swap or a traffic spike can multiply a bill overnight — the classic AI cost-unpredictability problem. Published spend caps, budget alerts, or committed-use discounts would give finance teams the predictability the current page lacks.

3. Too much hidden behind “contact us”

B200 capacity and all enterprise mechanics carry no public price. Publishing at least indicative enterprise bands or committed-volume discounts would extend fal’s strong transparency story from commodity GPUs to its highest-value SKUs.


Key takeaways

  1. Price the output, not the seat. fal meters per generated second, image, and megapixel — the exact units its customers produce — so cost scales with usage instead of headcount. Generative-media tools should consider output as the value metric before defaulting to per-user pricing.
  2. Normalise prices for the buyer. Publishing “output per $1” turns a price list into a comparator and removes the buyer’s math. A small UX choice can materially lower the barrier to purchase.
  3. Offer both managed and raw surfaces. Selling per-output APIs and per-hour GPU rental side by side captures both convenience-buyers and cost-optimisers without forcing a single model on everyone.
  4. Transparency is a feature — but it has limits. fal publishes commodity GPU and model rates openly yet gates frontier capacity (B200) and enterprise behind sales, balancing trust against negotiating leverage.
  5. Fully variable pricing needs guardrails. Without a free tier, spend caps, or commits, a metered model can produce both trial friction and bill shock; the absence of these controls is the main gap in fal’s otherwise clean model.

UBP implications

  1. Output-based metering is the natural fit for generative media. fal shows that per-second and per-image billing maps cleanly onto generative AI’s unit of value, a template other media-inference vendors can copy directly.
  2. Self-published cost normalisation is an emerging UBP best practice. “Output per $1” is a buyer-facing innovation that other usage-based vendors could adopt to make complex per-unit rates legible.
  3. Pure-usage models trade predictability for fairness. fal’s seatless, floorless model is maximally fair but maximally variable; the broader UBP lesson is that metered pricing usually needs commitment options or spend controls layered on to be enterprise-ready.

Sources


Bottom line

fal is generative-media inference priced exactly the way its customers produce value: per second of video, per image, per megapixel, and per GPU-hour — with no seats, no subscription, and no free tier. The pricing page doubles as a cost comparator, making fal one of the more transparent usage-based AI infrastructure vendors, even as B200 capacity and enterprise terms stay behind “contact us.”

Want to compare fal against other usage-based AI infrastructure pricing? Browse the pricing blueprint.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Serverless & Compute + Model APIs (current)

The live page presents two metered surfaces — Serverless & Compute (GPU per-hour/per-second: H100 $1.89/h, H200 $2.10/h, A100 $0.99/h, B200 'contact us') and Model APIs (Video: Wan 2.5 $0.05/s, Kling 2.5 Turbo Pro $0.07/s, Veo 3 $0.4/s, Ovi $0.2/video; Image: Seedream V4 $0.03, Flux Kontext Pro $0.04, Nanobanana $0.0398, Qwen $0.02/MP). No seats, subscriptions, or free tier.

Serverless & Compute + Model APIs (current) - The live page presents two metered surfaces — Serverless & Compute (GPU per-hour
captured

Modern layout: Output-Based Pricing + B200 'contact us'

Coinciding with the $125M Series C ($1.5B valuation), the page reached its current structure: 'GPU Pricing' (H100 $1.89/h, H200 $2.10/h, A100 $0.99/h, B200 184GB 'contact us') plus 'Output-Based Pricing' split into Video Models and Image Models with an 'Output per $1' comparator column. B200 Blackwell capacity was added as sales-gated.

Modern layout: Output-Based Pricing + B200 'contact us' - Coinciding with the $125M Series C ($1.5B valuation), the page reached its curre
captured

H100 cut to $1.89/hr; full per-hour fleet

The H100 hourly rate dropped to $1.89/h ($0.0005/s), with H200 141GB at $2.10/h, A100 40GB at $0.99/h ($0.0003/s), and A6000 48GB at $0.60/h. These H100/H200/A100 rates are the same ones live on the page a year later.

H100 cut to $1.89/hr; full per-hour fleet - The H100 hourly rate dropped to $1.89/h ($0.0005/s), with H200 141GB at $2.10/h,
captured

Per-hour GPU pricing introduced ($1.99/hr H100)

fal switched its GPU fleet to a per-hour display alongside per-second, leading with 'Get H100s from as low as $1.99/hr.' The new 'GPU Pricing' table priced H100 80GB at $1.99/h. This is the move from a budget-slider comparator to the per-hour/per-second table that defines the current page.

Per-hour GPU pricing introduced ($1.99/hr H100) - fal switched its GPU fleet to a per-hour display alongside per-second, leading w
captured

$49M Series B + pricing hero redesign

A site banner announced 'fal Raises $49M Series B to Power the Future of AI Video' (Notable Capital + a16z). The pricing page was redesigned with the 'Fast, reliable, and cost-efficient' hero and simplified nav (Pricing / Enterprise), keeping both the model-output table and the GPU budget comparator (H100 $0.00125/s, A100 $0.00111/s, A6000 $0.000575/s).

$49M Series B + pricing hero redesign - A site banner announced 'fal Raises $49M Series B to Power the Future of AI Vide
captured

H100 added + 'Billing Based on Model Output' table

fal added the H100 80GB ($0.00125/s) to the top of the GPU fleet and introduced a 'Billing Based on Model Output' table (FLUX.1 [dev]/[schnell]/[pro], Stable Diffusion 3 Medium, Stable Video) — 'models below are billed by model output, instead of compute seconds.' The new diamond 'fal' wordmark also debuted. Per-output model-API billing layered on top of GPU rental.

H100 added + 'Billing Based on Model Output' table - fal added the H100 80GB ($0.00125/s) to the top of the GPU fleet and introduced
captured

'Choose a budget' output comparator UI

fal added a budget slider ($1–$200) that translated GPU-second rates into 'how many inferences per $20' — e.g. SDXL ~10,296 runs, SDXL Lightning ~47,415 runs, Whisper v3 ~3,677 runs. Per-second GPU rates persisted (A100 $0.00111/s, A6000 $0.000575/s, A10G $0.00053/s) but the page became a cost comparator. Birth of fal's output-normalisation.

'Choose a budget' output comparator UI - fal added a budget slider ($1–$200) that translated GPU-second rates into 'how m
captured

Raw per-second unit pricing

fal's earliest captured pricing billed compute by raw unit-second: CPU $0.00003/s, Memory $0.000004/s, GPU A100 $0.001/s, A10G $0.0002/s, T4 $0.00009/s, plus Storage $1/GB/month. A 'Choose your Machine Type' picker summed units (e.g. A100 machine = $0.00111/s). No output-based pricing or budget comparator yet.

Raw per-second unit pricing - fal's earliest captured pricing billed compute by raw unit-second: CPU $0.00003/
captured
Trivia
  • · fal advertises H100 GPUs 'from as low as $1.89/hr' — a per-second-billed rate ($0.0005/s) that undercuts most on-demand hyperscaler H100 list prices.
  • · fal has no seats, no monthly plans, and no free tier on its public pricing page — every line item is metered per output or per unit of compute time.
  • · fal normalises model-API prices to 'output per $1' on its own pricing page (e.g. 20 seconds of Wan 2.5 video, or 33 Seedream V4 images), turning the price table into a buyer-facing cost comparator.

Questions & answers

Does fal.ai have a free tier or monthly plan?
No. fal's public pricing page lists only usage-based rates — per-output model API calls and per-hour/per-second GPU compute. There are no seats, subscriptions, or free tier advertised.
How much does an H100 GPU cost on fal?
fal lists the H100 (80GB) at $1.89/hr, or $0.0005 per second. The H200 (141GB) is $2.10/hr, the A100 (40GB) is $0.99/hr, and the B200 (184GB) is 'contact us'. Rates are 'starting at' for custom deployments.
How is fal video generation priced?
Video models are billed by output unit — per second or per video. Examples: Wan 2.5 at $0.05/second, Kling 2.5 Turbo Pro at $0.07/second, Veo 3 at $0.4/second, and Ovi at $0.2/video.
How is fal image generation priced?
Image models are billed per image or per megapixel. Examples: Seedream V4 at $0.03/image, Flux Kontext Pro at $0.04/image, Nanobanana at $0.0398/image, and Qwen at $0.02/megapixel.
What does fal enterprise include?
fal enterprise adds private model hosting, custom inference and training kernels, foundational model research, dedicated serverless infrastructure, SOC2 certification, SSO, and user management — all priced via Contact Sales.