Does fal.ai have a free tier or monthly plan?

No. fal's public pricing page lists only usage-based rates — per-output model API calls and per-hour/per-second GPU compute. There are no seats, subscriptions, or free tier advertised.

How much does an H100 GPU cost on fal?

fal lists the H100 (80GB) at $1.89/hr, or $0.0005 per second. The H200 (141GB) is $2.10/hr, the A100 (40GB) is $0.99/hr, and the B200 (184GB) is 'contact us'. Rates are 'starting at' for custom deployments.

How is fal video generation priced?

Video models are billed by output unit — per second or per video. Examples: Wan 2.5 at $0.05/second, Kling 2.5 Turbo Pro at $0.07/second, Veo 3 at $0.4/second, and Ovi at $0.2/video.

How is fal image generation priced?

Image models are billed per image or per megapixel. Examples: Seedream V4 at $0.03/image, Flux Kontext Pro at $0.04/image, Nanobanana at $0.0398/image, and Qwen at $0.02/megapixel.

What does fal enterprise include?

fal enterprise adds private model hosting, custom inference and training kernels, foundational model research, dedicated serverless infrastructure, SOC2 certification, SSO, and user management — all priced via Contact Sales.

Fal Pricing

AI Summary

fal (fal.ai) is a generative-media inference platform that prices purely on usage: serverless per-output model APIs plus dedicated GPU compute, with no seats, no subscriptions, and no free tier on its public pricing page.
fal's GPU compute fleet is billed per hour and per second: H100 80GB at $1.89/hr ($0.0005/s), H200 141GB at $2.10/hr ($0.0006/s), A100 40GB at $0.99/hr ($0.0003/s), and B200 184GB at 'contact us'.
fal model APIs are billed by output unit — video models per second or per video (Wan 2.5 at $0.05/s, Kling 2.5 Turbo Pro at $0.07/s, Veo 3 at $0.4/s, Ovi at $0.2/video) and image models per image or per megapixel (Seedream V4 $0.03/image, Flux Kontext Pro $0.04/image, Qwen $0.02/MP).
fal's enterprise tier adds private model hosting, custom inference and training kernels, foundational model research, SOC2 certification, SSO, and dedicated serverless infrastructure — all quoted via sales.
fal positions itself as cost-efficient inference infrastructure for other AI products, powering generative media for customers like Poe, PlayAI, and Genspark.

Pricing summary

fal 2026 — pure-usage generative-media inference

No seats, no subscription, no free tier: pay per output for model APIs and per hour/second for GPU compute.

Model APIs — Image

from $0.02 /image

Developers calling image models per output

Model APIs — Video

from $0.05 /second

Developers generating video per second or per clip

GPU Compute

from $0.99 /hr

Teams deploying their own apps on a GPU fleet

Enterprise

Contact sales

Fortune-500 and high-volume AI products

Rates are verbatim from fal.ai/pricing (USD). GPU prices are 'starting at' for custom deployments; B200 184GB is 'contact us'.

About

fal (legal entity “features and labels”, at fal.ai) is a generative-media inference platform founded in 2021 by Burkay Gur and Gorkem Yurtseven and headquartered in San Francisco. It runs a GPU fleet optimised for fast, low-latency inference of generative models — image, video, audio, and text-to-speech — and exposes them both as ready-to-call model APIs and as dedicated compute customers can deploy their own apps onto.

fal sells inference wholesale to other AI products as much as to individual developers. Its own marketing cites powering roughly 50% of Poe’s image and video generation, low-latency TTS for PlayAI’s voice agents, and multimodal inference for Genspark’s Super Agent. The company positions on two axes competitors struggle to combine: raw speed (custom inference and training kernels) and cost efficiency (H100 GPUs advertised “from as low as $1.89/hr”).

That positioning has attracted aggressive funding. fal raised a $9M seed (a16z), a $14M Series A (Kindred Ventures), a $49M Series B in February 2025 (Notable Capital + a16z), a $125M Series C in July 2025 at a $1.5B valuation (Meritech), and a $140M Series D in December 2025 at a $4.5B valuation led by Sequoia with NVIDIA’s NVentures participating — roughly tripling its valuation in five months on revenue that grew about 1,040% to ~$285M ARR by the end of 2025. The pricing page rebuilt itself alongside that growth, shifting from raw per-second compute billing to today’s per-output model APIs.

Its closest comparison set is other generative-media inference providers and serverless GPU clouds — Replicate, Runpod, and the model-hosting tiers of the hyperscalers. Unlike seat-priced creative SaaS, fal’s entire public pricing surface is metered: you pay per output for model APIs and per hour or per second for GPU compute. There is no free tier, no subscription, and no per-user fee on the published pricing page.

Pricing summary : How fal’s pure-usage model-inference pricing works

fal uses a pure usage-based pricing model — a self-serve, pay-as-you-go motion built on two metered surfaces, with no seats and no subscription:

Serverless & Compute (GPU rental): Deploy your own app on fal’s GPU fleet, billed per hour and per second. H100 80GB is $1.89/hr ($0.0005/s), H200 141GB is $2.10/hr ($0.0006/s), A100 40GB is $0.99/hr ($0.0003/s), and B200 184GB is “contact us.” Rates are “starting at” for custom deployments.
Model APIs (per output): Call hosted models and pay by output unit, the way individual developers prefer to buy. Video models bill per second or per video (Wan 2.5 $0.05/s, Kling 2.5 Turbo Pro $0.07/s, Veo 3 $0.4/s, Ovi $0.2/video). Image models bill per image or per megapixel (Seedream V4 $0.03/image, Flux Kontext Pro $0.04/image, Nanobanana $0.0398/image, Qwen $0.02/MP).
Enterprise: Private model hosting, custom kernels, dedicated serverless infrastructure, SLA, SOC2, and SSO — all “Contact Sales,” no public price.

What makes this different: fal publishes prices normalised to “output per $1” (e.g. 20 seconds of Wan 2.5 video or 33 Seedream V4 images), turning the pricing page itself into a buyer-facing cost comparator rather than a plan picker.

Pricing by product

GPU compute (Serverless & Compute)

Deploy your own app on fal’s GPU fleet. Rates are “starting at” for custom deployments; contact [email protected] to get started.

GPU	VRAM	Price per hour	Price per second	Key mechanics
A100	40GB	$0.99/hr	$0.0003/s	Lowest-cost tier for steady inference
H100	80GB	$1.89/hr	$0.0005/s	Headline “from $1.89/hr” GPU
H200	141GB	$2.10/hr	$0.0006/s	Larger VRAM for bigger models
B200	184GB	contact us	contact us	Newest Blackwell capacity, sales-gated

Model APIs — Video models

Video models are billed by output unit — per second or per video — depending on the model.

Model	Unit	Price	Output per $1	Key mechanics
Wan 2.5	second	$0.05	20 seconds	Cheapest per-second video listed
Kling 2.5 Turbo Pro	second	$0.07	14 seconds	Mid-tier per-second video
Veo 3	second	$0.4	3 seconds	Premium per-second video
Ovi	video	$0.2	5 videos	Billed per whole video, not per second

Model APIs — Image models

Image models are billed by image count or by output size in megapixels (MP); fal normalises listed prices to 1MP for comparison.

Model	Unit	Price	Output per $1	Key mechanics
Qwen	megapixel	$0.02	50 megapixels	Cheapest, billed per MP
Seedream V4	image	$0.03	33 images	Per-image billing
Flux Kontext Pro	image	$0.04	25 images	Per-image billing
Nanobanana	image	$0.0398	25 images	Per-image billing

Enterprise (separate sales-led tier)

Tier	Price	Included	Key mechanics
Enterprise	Contact sales	Private model hosting, custom inference/training kernels, foundational model research, dedicated serverless infra, SLA, SOC2, SSO, user management	Sales-led, quoted

Sales motions across products: self-serve / pay-as-you-go for GPU compute and model APIs; sales-led for enterprise (private hosting, custom kernels, SLA, SOC2, SSO) and for B200 capacity.

Hidden costs : What metered inference actually costs at volume

fal’s per-output rates look tiny in isolation, but generative-media workloads multiply them fast. Two representative archetypes:

A consumer AI app generating short videos

A product shipping 50,000 short clips a month (5-second clips at $0.05/s on a Wan-2.5-class model), plus 200,000 preview images at $0.03 each on Seedream V4.

Line item	Monthly cost
50,000 clips × 5s × $0.05/s	$12,500
200,000 preview images × $0.03	$6,000
Total	$18,500

At this volume the per-second video charge dominates — a single model swap from Wan 2.5 ($0.05/s) to Veo 3 ($0.4/s) would multiply the video line roughly 8×, to ~$100,000/mo.

A team self-hosting on dedicated GPUs

A team running two H100s continuously for a custom pipeline rather than calling model APIs.

Line item	Monthly cost
2 × H100 × $1.89/hr × 730 hrs	$2,759
Overflow burst: 1 × H200 × $2.10/hr × 100h	$210
Total	$2,969

Self-hosting trades the convenience of per-output API pricing for flat GPU-hour cost — cheaper at high, steady utilisation, but you pay for idle time. The crossover depends entirely on how busy the GPUs stay.

Want to estimate your own fal bill? Use the fal pricing calculator to model your monthly cost across per-output model-API calls and GPU-hour compute.

Pricing evolution : From GPU rental to normalised per-output model APIs

fal’s pricing model rebuilt itself twice in eighteen months: from raw per-second unit billing (2024) to a per-output model comparator (2025), tracking the company’s pivot from a generic serverless-GPU host into a generative-media inference platform — and four funding rounds that took it from a $49M Series B to a $4.5B valuation.

Cadence

Quarter	Price changes	Product / SKU additions	Notes
2024 Q1	0	0	Raw per-second unit pricing: CPU $0.00003/s, A100 $0.001/s, A10G $0.0002/s, T4 $0.00009/s; “Choose your Machine Type” picker.
2024 Q2	1	0	2024-05 — “Choose a budget” comparator UI: GPU rates re-expressed as inferences-per-$20 (SDXL, Whisper); A100 now $0.00111/s, A6000 $0.000575/s.
2025 Q1	1	2	2025-01 H100 80GB ($0.00125/s) added + “Billing Based on Model Output” table (FLUX.1, SD3, Stable Video); 2025-02 $49M Series B banner + hero redesign.
2025 Q2	2	1	2025-04 per-hour GPU display (“H100s from $1.99/hr”); 2025-05 H100 cut to $1.89/h, H200 $2.10/h, A100 $0.99/h, A6000 $0.60/h added.
2025 Q3	0	1	2025-07 modern layout reached (coincident with $125M Series C, $1.5B valuation): “Output-Based Pricing” split into Video/Image Models; B200 “contact us” added.
2026 Q2	0	0	2026-06-01 — current capture: section labels now “Serverless & Compute Pricing” and “Model APIs Pricing”; H100/H200/A100 rates unchanged since 2025-05.

Tracked range: 2024-02–2026-06-01. Quarters not listed above were verified stable (0 price changes, 0 SKU additions).

Notable changes

2024-02 — Earliest captured pricing: raw per-second unit rates (CPU/Memory/GPU/Storage) with a machine-type picker; no output-based pricing.
2024-05 — Budget-slider comparator introduced; GPU-second rates re-expressed as “inferences per $20” — fal’s first output-normalisation.
2025-01 — H100 added; per-output “Billing Based on Model Output” table launched alongside the GPU comparator; new diamond wordmark.
2025-02 — $49M Series B (Notable Capital + a16z) announced via on-site banner; pricing hero redesigned.
2025-04 — Per-hour GPU pricing introduced, led by “H100s from as low as $1.99/hr.”
2025-05 — H100 cut to $1.89/h; full per-hour fleet (H200 $2.10/h, A100 $0.99/h, A6000 $0.60/h) — the rates still live a year later.
2025-07 — Modern “Output-Based Pricing” layout (Video + Image Models, “Output per $1”); B200 184GB added as sales-gated “contact us.” Coincided with the $125M Series C at a $1.5B valuation.

The 2024-to-2025 repricing in detail

fal’s pricing history is really the story of a positioning change. In 2024 the page sold generic serverless GPU compute — you picked a machine type and paid per unit-second of CPU, memory, and GPU, the same way you’d reason about a cloud VM. The mid-2024 “budget slider” was the first hint of the eventual strategy: it re-expressed those raw GPU-second rates as “how many SDXL images can $20 buy,” teaching buyers to think in outputs rather than seconds.

The decisive break came across 2025 Q1–Q3, bracketed by funding. The $49M Series B (TechCrunch, Sep 2024 seed coverage; Series B announced Feb 2025) financed the pivot to “the future of AI video,” and the pricing page followed: a dedicated “Billing Based on Model Output” table (Jan 2025) introduced true per-output rates, the GPU fleet moved to a per-hour display (Apr 2025) with H100 cut from $1.99 to $1.89/hr (May 2025), and by the $125M Series C ($1.5B valuation, July 2025) the page had split cleanly into GPU rental and per-output Model APIs with the “Output per $1” comparator. fal went on to raise a $140M Series D in December 2025 led by Sequoia at a $4.5B valuation — roughly tripling its Series C mark in five months — on the back of revenue that grew ~1,040% to ~$285M ARR by the end of 2025. The current page is the stable end-state of that repricing.

What’s unique : Output-normalised, seatless inference pricing

1. The pricing page is a cost comparator, not a plan picker. fal publishes every model rate alongside a normalised “output per $1” figure — 20 seconds of Wan 2.5 video, 33 Seedream V4 images, 50 megapixels of Qwen. Buyers compare cost-per-output directly across models instead of decoding tiers.

2. Two billing surfaces, one metered philosophy. fal lets you either call hosted models per output (zero ops, pay per second/image) or rent the underlying GPU per hour/second and deploy your own app. The same usage-based logic spans both — there is no seat or subscription anywhere on the public page.

3. Dual time granularity on compute. GPU compute is quoted both per hour and per second (H100 at $1.89/hr or $0.0005/s), letting bursty workloads reason about cost at the second while steady ones budget by the hour.

4. Sales-gating the frontier. The newest Blackwell B200 (184GB) is the only GPU with no public price — “contact us” — using transparency on commodity GPUs while reserving newest-capacity pricing for negotiation.

Strengths & weaknesses

Strengths	Weaknesses
Fully transparent per-output and per-GPU-hour pricing	No free tier to lower trial friction
”Output per $1” normalisation makes model cost comparison easy	Premium models (Veo 3 at $0.4/s) make bills volatile at scale
Per-second granularity aligns cost with actual compute consumed	B200 and enterprise pricing hidden behind “contact us”
Two surfaces (managed APIs vs raw GPU) fit different ops appetites	No published volume-commit discounts on the pricing page
No seats — cost scales with usage, not headcount	Bill is fully variable; no predictable monthly floor

Billing UX : Named controls on the public pricing surface

“Output per $1” comparator column — every model-API row shows normalised output (seconds, images, or megapixels) per dollar so buyers can compare cost-efficiency without doing the math.
Per-hour and per-second GPU columns — the Serverless & Compute table prices each GPU at both granularities side by side.
“Start Building” vs “Contact Sales” CTAs — each pricing block pairs a self-serve entry point with a sales path, separating PLG from sales-led motions inline.
Enterprise Contact Form with company-size selector — the enterprise capture exposes a structured intake (Just me → 2,000+ people) that routes high-volume buyers to sales.
“contact us” cells for gated SKUs — B200 capacity and enterprise features render as explicit “contact us” placeholders rather than hidden rows.

Strategic wins : Why fal’s seatless, output-normalised pricing works

1. Pricing the unit the customer actually buys

fal charges per generated output — a second of video, an image, a megapixel — which is exactly the unit a generative-media product produces. This is textbook value-metric alignment: the bill moves with the customer’s own output volume, so cost feels fair and scales with their success rather than their headcount.

2. Turning the price table into a comparator

By normalising every rate to “output per $1,” fal removes the cognitive tax of comparing $0.05/second against $0.03/image. This is a subtle usage-based pricing UX win: the pricing page does the buyer’s cost-modelling for them, lowering the barrier to choosing a model.

3. Two surfaces capture two buyer mindsets

Offering both managed per-output APIs and raw per-hour GPU rental lets fal monetise the convenience-seeker and the cost-optimiser without forcing either into the wrong model — a flexible answer to the AI infrastructure cost question of “rent capacity or pay per result.” It is the same dual-surface bet Replicate and Runpod make, but fal leans harder on the per-output side for generative media.

Areas to improve : Predictability and trial gaps in a fully-metered model

1. No free tier or trial credit

With every line item metered and no free allowance, a curious developer must commit a payment method before generating a single output. A small monthly free-output grant (a few free images/seconds) would lower trial friction the way credit grants do in other usage-based pricing models, without materially denting revenue.

2. Fully variable bills invite bill shock

Because there is no floor and premium models cost 8× the cheapest, a model swap or a traffic spike can multiply a bill overnight — the classic AI cost-unpredictability problem. Published spend caps, budget alerts, or committed-use discounts would give finance teams the predictability the current page lacks.

3. Too much hidden behind “contact us”

B200 capacity and all enterprise mechanics carry no public price. Publishing at least indicative enterprise bands or committed-volume discounts would extend fal’s strong transparency story from commodity GPUs to its highest-value SKUs.

Monetization stack & signals : how Fal builds & buys its revenue engine

Buys 7 Builds 0 11 open roles

The read — where the monetization investment is going

fal buys its meter: a dedicated Payments team wires Orb for usage metering and Stripe for payments rather than building either. A widening commercial org (enterprise PM, GTM data, account managers on Salesforce/Gong) overlays sales-led motion onto the self-serve core.

Stack — build vs buy

Buys (vendor) · 7

Stripe Payments Job post 1 Job post 2 Jun 2026

“integrate with Orb for usage metering and Stripe for payments and invoicing”
Orb Metering Job post Jun 2026

“integrate with Orb for usage metering and Stripe for payments and invoicing”
Salesforce CRM Job post 1 Job post 2 Jun 2026

“Technologies You'll Use Salesforce • Slack • Hex • Gong • Apollo”
Gong Customer success Job post Jun 2026

“Technologies You'll Use Salesforce • Slack • Hex • Gong • Apollo”
NetSuite Revenue recognition inferred Job post Jun 2026

“Experience with NetSuite, Carta, Shareworks, or similar tools a plus”
Snowflake + dbt Data platform inferred Job post Jun 2026

“building ingestion pipelines into a warehouse (BigQuery, Snowflake, Redshift) … Strong SQL and working proficiency in dbt”
Amplitude / Looker Analytics inferred Job post Jun 2026

“SQL, Amplitude/Looker, or plain-text CSVs—whatever gets you to the insight fastest”

Open roles in the revenue & lifecycle org — 11

View open roles

Senior Data Scientist, GTM RevOps May 13, 2026
AR Specialist RevOps May 13, 2026
Technical Accounting and Reporting Manager Deal desk seen Mar 19, 2026
Enterprise Product Manager Monetization seen Dec 15, 2025
Staff Software Engineer, Payments Billing engineering seen Nov 18, 2025
Account Manager, Commercial (North America) Customer success seen Oct 17, 2025
Account Manager, Enterprise Customer success seen Oct 17, 2025
Staff Software Engineer, Forward Deployed Customer success seen Oct 17, 2025
Technical Business Development (Model Labs) Customer success seen Oct 17, 2025
Software Engineer, Growth Growth seen Jul 21, 2025
Senior Data Scientist, Growth Growth seen Jul 21, 2025

Signals reviewed Jun 2026 · derived from public job posts

Job postings fill and close over time — once a posting is filled we keep it as a dated citation (the quoted evidence remains); use View open roles for current listings.

Key takeaways

Price the output, not the seat. fal meters per generated second, image, and megapixel — the exact units its customers produce — so cost scales with usage instead of headcount. Generative-media tools should consider output as the value metric before defaulting to per-user pricing.
Normalise prices for the buyer. Publishing “output per $1” turns a price list into a comparator and removes the buyer’s math. A small UX choice can materially lower the barrier to purchase.
Offer both managed and raw surfaces. Selling per-output APIs and per-hour GPU rental side by side captures both convenience-buyers and cost-optimisers without forcing a single model on everyone.
Transparency is a feature — but it has limits. fal publishes commodity GPU and model rates openly yet gates frontier capacity (B200) and enterprise behind sales, balancing trust against negotiating leverage.
Fully variable pricing needs guardrails. Without a free tier, spend caps, or commits, a metered model can produce both trial friction and bill shock; the absence of these controls is the main gap in fal’s otherwise clean model.

UBP implications

Output-based metering is the natural fit for generative media. fal shows that per-second and per-image billing maps cleanly onto generative AI’s unit of value, a template other media-inference vendors can copy directly.
Self-published cost normalisation is an emerging UBP best practice. “Output per $1” is a buyer-facing innovation that other usage-based vendors could adopt to make complex per-unit rates legible.
Pure-usage models trade predictability for fairness. fal’s seatless, floorless model is maximally fair but maximally variable; the broader UBP lesson is that metered pricing usually needs commitment options or spend controls layered on to be enterprise-ready.

Sources

fal pricing page (accessed 2026-06-01)
fal enterprise page (accessed 2026-06-01)
fal documentation (accessed 2026-06-01)
fal blog (accessed 2026-06-01)

Bottom line

fal is generative-media inference priced exactly the way its customers produce value: per second of video, per image, per megapixel, and per GPU-hour — with no seats, no subscription, and no free tier. The pricing page doubles as a cost comparator, making fal one of the more transparent usage-based AI infrastructure vendors, even as B200 capacity and enterprise terms stay behind “contact us.”

Want to compare fal against other usage-based AI infrastructure pricing? Browse the pricing blueprint.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Serverless & Compute + Model APIs (current)

Jun 2026

The live page presents two metered surfaces — Serverless & Compute (GPU per-hour/per-second: H100 $1.89/h, H200 $2.10/h, A100 $0.99/h, B200 'contact us') and Model APIs (Video: Wan 2.5 $0.05/s, Kling 2.5 Turbo Pro $0.07/s, Veo 3 $0.4/s, Ovi $0.2/video; Image: Seedream V4 $0.03, Flux Kontext Pro $0.04, Nanobanana $0.0398, Qwen $0.02/MP). No seats, subscriptions, or free tier.

captured 2026-06-01

Modern layout: Output-Based Pricing + B200 'contact us'

Jul 2025

Coinciding with the $125M Series C ($1.5B valuation), the page reached its current structure: 'GPU Pricing' (H100 $1.89/h, H200 $2.10/h, A100 $0.99/h, B200 184GB 'contact us') plus 'Output-Based Pricing' split into Video Models and Image Models with an 'Output per $1' comparator column. B200 Blackwell capacity was added as sales-gated.

captured 2025-07-01

H100 cut to $1.89/hr; full per-hour fleet

May 2025

The H100 hourly rate dropped to $1.89/h ($0.0005/s), with H200 141GB at $2.10/h, A100 40GB at $0.99/h ($0.0003/s), and A6000 48GB at $0.60/h. These H100/H200/A100 rates are the same ones live on the page a year later.

captured 2025-05-01

Per-hour GPU pricing introduced ($1.99/hr H100)

Apr 2025

fal switched its GPU fleet to a per-hour display alongside per-second, leading with 'Get H100s from as low as $1.99/hr.' The new 'GPU Pricing' table priced H100 80GB at $1.99/h. This is the move from a budget-slider comparator to the per-hour/per-second table that defines the current page.

captured 2025-04-01

$49M Series B + pricing hero redesign

Feb 2025

A site banner announced 'fal Raises $49M Series B to Power the Future of AI Video' (Notable Capital + a16z). The pricing page was redesigned with the 'Fast, reliable, and cost-efficient' hero and simplified nav (Pricing / Enterprise), keeping both the model-output table and the GPU budget comparator (H100 $0.00125/s, A100 $0.00111/s, A6000 $0.000575/s).

captured 2025-02-01

H100 added + 'Billing Based on Model Output' table

Jan 2025

fal added the H100 80GB ($0.00125/s) to the top of the GPU fleet and introduced a 'Billing Based on Model Output' table (FLUX.1 [dev]/[schnell]/[pro], Stable Diffusion 3 Medium, Stable Video) — 'models below are billed by model output, instead of compute seconds.' The new diamond 'fal' wordmark also debuted. Per-output model-API billing layered on top of GPU rental.

captured 2025-01-01

'Choose a budget' output comparator UI

May 2024

fal added a budget slider ($1–$200) that translated GPU-second rates into 'how many inferences per $20' — e.g. SDXL ~10,296 runs, SDXL Lightning ~47,415 runs, Whisper v3 ~3,677 runs. Per-second GPU rates persisted (A100 $0.00111/s, A6000 $0.000575/s, A10G $0.00053/s) but the page became a cost comparator. Birth of fal's output-normalisation.

captured 2024-05-01

Raw per-second unit pricing

Feb 2024

fal's earliest captured pricing billed compute by raw unit-second: CPU $0.00003/s, Memory $0.000004/s, GPU A100 $0.001/s, A10G $0.0002/s, T4 $0.00009/s, plus Storage $1/GB/month. A 'Choose your Machine Type' picker summed units (e.g. A100 machine = $0.00111/s). No output-based pricing or budget comparator yet.

captured 2024-02-01

Trivia

· fal advertises H100 GPUs 'from as low as $1.89/hr' — a per-second-billed rate ($0.0005/s) that undercuts most on-demand hyperscaler H100 list prices.
· fal has no seats, no monthly plans, and no free tier on its public pricing page — every line item is metered per output or per unit of compute time.
· fal normalises model-API prices to 'output per $1' on its own pricing page (e.g. 20 seconds of Wan 2.5 video, or 33 Seedream V4 images), turning the price table into a buyer-facing cost comparator.

Questions & answers

Does fal.ai have a free tier or monthly plan?: No. fal's public pricing page lists only usage-based rates — per-output model API calls and per-hour/per-second GPU compute. There are no seats, subscriptions, or free tier advertised.
How much does an H100 GPU cost on fal?: fal lists the H100 (80GB) at $1.89/hr, or $0.0005 per second. The H200 (141GB) is $2.10/hr, the A100 (40GB) is $0.99/hr, and the B200 (184GB) is 'contact us'. Rates are 'starting at' for custom deployments.
How is fal video generation priced?: Video models are billed by output unit — per second or per video. Examples: Wan 2.5 at $0.05/second, Kling 2.5 Turbo Pro at $0.07/second, Veo 3 at $0.4/second, and Ovi at $0.2/video.
How is fal image generation priced?: Image models are billed per image or per megapixel. Examples: Seedream V4 at $0.03/image, Flux Kontext Pro at $0.04/image, Nanobanana at $0.0398/image, and Qwen at $0.02/megapixel.
What does fal enterprise include?: fal enterprise adds private model hosting, custom inference and training kernels, foundational model research, dedicated serverless infrastructure, SOC2 certification, SSO, and user management — all priced via Contact Sales.