The Agentic AI Pricing Puzzle: When Your Software Decides How Much It Costs
Part 4 of the Future Ahead Series: Where AI Is Going and How It Will Transform Billing, Infrastructure, and Pricing Models
When Software Stops Following Instructions and Starts Making Decisions
Imagine you hire a contractor to renovate your kitchen. You agree on a fixed price of fifteen thousand dollars for the job. The contractor shows up, demolishes your old cabinets, installs the new ones, adds the countertops, and hands you a bill for fifteen thousand dollars. Simple, predictable, straightforward. Now imagine instead that you hire a contractor who says, “I’ll figure out what your kitchen needs when I get there, do whatever it takes to make it perfect, and then bill you based on how many decisions I had to make along the way.” You’d probably ask how much those decisions might cost and what the final bill could look like. But the contractor can’t tell you because they won’t know until they start working.
This second scenario feels uncomfortable because you’re giving up control over the cost while simultaneously delegating both the decision-making and the execution to someone else. You’re trusting them to act in your best interest while knowing that their actions directly determine your bill. And yet this is exactly the situation that agentic AI is creating for software companies trying to figure out how to price these systems. When software moves from executing predefined instructions to autonomously deciding what steps to take to accomplish a goal, everything about pricing becomes more complicated. The unpredictability isn’t a bug in the system. It’s a fundamental feature of how these agents work.
Let me explain what I mean by this and why it matters so profoundly for how we think about billing infrastructure and pricing models. We’re not talking about incremental complexity added to existing paradigms. We’re talking about a category of software that breaks most of our assumptions about how pricing works, what customers can reasonably be expected to predict about their bills, and what controls vendors need to build to prevent chaos. This is genuinely new territory, and the companies that figure it out first will have enormous advantages over those that try to force old models onto new realities.
Understanding Agentic AI: More Than Just Fancy Automation
Before we can tackle pricing, we need to be very clear about what agentic AI actually is and how it differs from everything that came before. This distinction matters because people often conflate different types of AI capabilities under the same umbrella, which creates confusion about what’s actually hard to price and what’s just regular usage-based billing with a different label.
Traditional software automation follows explicit instructions that a human programmer encoded. You write a script or configure a workflow that says, “When event X happens, do action Y, then action Z.” The path through the logic is predetermined. The software executes the exact sequence you specified. If the workflow involves ten steps, it will always involve those same ten steps in that same order every single time it runs. This predictability makes billing straightforward because the cost to execute the workflow is essentially fixed once you’ve designed it.
The first generation of AI tools brought probabilistic outputs but still followed predetermined paths. A language model might generate different text each time you ask it the same question, making the output variable, but the input cost is predictable. You send a thousand tokens as input, you receive a certain number of tokens as output, and your cloud provider or API vendor charges you a known amount per token. The path from input to output is still direct even though the specific content of the output varies. Billing remains conceptually similar to traditional software consumption models because the user controls exactly when and how often the AI gets invoked.
Agentic AI represents a qualitatively different paradigm. An agentic system takes a goal and autonomously determines what sequence of steps it needs to take to accomplish that goal. It might decompose a complex request into multiple sub-tasks. It decides which external tools or data sources it needs to consult at each step. It evaluates whether its intermediate outputs are satisfactory or whether it needs to try a different approach. It might loop back and retry failed steps or pursue alternative strategies when its first attempt doesn’t work. Critically, the agent makes all these decisions itself based on the context and the problem it’s trying to solve. Two superficially similar requests might result in completely different execution paths with vastly different costs.
Let me give you a concrete example to make this tangible. Suppose you’re using an AI customer service agent and a customer asks, “Why hasn’t my order shipped yet?” This seems like a simple question, but let’s walk through what an agentic system might do to answer it properly. First, the agent needs to identify which customer is asking and which order they’re referring to. It might query a customer database using the customer’s email address or phone number. Then it needs to look up the order details in an order management system using the order ID. Next it checks the shipping status in a logistics system to understand where the order is in the fulfillment process. It might discover that the order is delayed because one of the items was out of stock. So now it queries an inventory system to see when that item will be back in stock. It might also check company policies to understand what options exist for delayed orders. Should it offer to split the shipment and send what’s available now? Should it offer a discount for the inconvenience? The agent evaluates these options, composes a response that addresses the customer’s question and proactively offers solutions, and then sends the response. Finally, it might update a ticket in the CRM system to document the interaction.
Each of these steps involves API calls, database queries, and LLM invocations. Some steps might use vector databases to search through policy documents or historical interactions. The agent might call an embedding model to convert text into vectors for semantic search. It might use different LLM models for different sub-tasks, perhaps routing simple lookups to a cheap model but using a more expensive reasoning model to evaluate policy options. If any step fails or returns ambiguous information, the agent might retry with a different approach or ask clarifying questions. All of this happens automatically based on the agent’s understanding of what it needs to do to properly answer the customer’s question. The customer asks one question and receives one answer, but behind the scenes, the system might have performed twenty or thirty distinct operations across multiple systems and models to construct that answer.
Now contrast that with a different customer who asks, “What’s your return policy?” To answer this, the agent might simply retrieve the relevant section from a knowledge base document using a single vector database lookup, feed that into an LLM to generate a natural language summary, and send the response. Two API calls total. Same product, same agent, radically different execution paths and costs. And crucially, neither the customer nor the vendor knows in advance which type of query they’re going to get. The agent decides its execution strategy dynamically based on what it encounters as it works through the problem.
This unpredictability creates the core pricing challenge. You can’t price these systems based purely on inputs because the same input can trigger vastly different execution paths. You can’t price them based on outputs because a short simple answer might have required extensive backend processing while a long detailed answer might have been retrieved from a document with minimal computation. And you can’t price them based on time because agentic systems work asynchronously, often completing multiple workflows in parallel, making wall-clock time a poor proxy for actual resource consumption.
The Hidden Cost Structure: Why Agents Are Expensive in Surprising Ways
To understand how to price agentic systems, we first need to understand where the costs actually come from. This isn’t just about LLM tokens anymore, though those certainly remain significant. The cost structure of an agentic AI system is fundamentally different from simpler AI applications, and many companies are discovering this the hard way when their bills come in much higher than their simplistic models predicted.
The first major cost component is what we might call cascading model invocations. When an agent uses multiple reasoning steps to solve a problem, each step might require its own LLM call. Modern reasoning models have made this even more pronounced because they generate thousands of internal reasoning tokens before producing their final output. But agents compound this further by chaining together multiple independent model calls across their workflow. An agent might call a model to understand the user’s intent, then call it again to generate a database query, then call it once more to interpret the query results, and finally call it yet again to compose the final response to the user. Each invocation consumes tokens for both input and output, and the intermediate outputs from one step often become inputs to the next step, creating a compounding effect on token consumption.
Research data from companies running agentic systems at scale shows that complex agent workflows can consume five to ten million tokens monthly even with moderate usage volumes of just a few thousand queries per day. When you’re paying ten to thirty dollars per million tokens for GPT-4 class models, and your agent is decomposing each user request into an average of four or five separate model calls, you can see how costs escalate quickly. A single customer interaction that appears simple on the surface might easily consume five thousand to ten thousand tokens behind the scenes, translating to ten to thirty cents in LLM costs alone. Scale that across tens of thousands of interactions daily, and you’re looking at thousands of dollars in daily API charges just for the language model components.
But LLM costs are actually only part of the picture, and often not even the largest part for sophisticated agents. The second major cost component is tool calling and integration overhead. Agents don’t operate in isolation. They need to interact with external systems to retrieve information, perform actions, or verify outcomes. Every time an agent calls a customer database, queries a knowledge base, invokes a third-party API, or writes data back to a CRM, there are costs involved. Some of these costs are direct charges from the systems being integrated. Salesforce might charge you based on API call volume. Your data warehouse might bill based on queries executed. Third-party services almost certainly have their own usage-based pricing. But there are also infrastructure costs for the middleware that brokers these interactions, maintains connection pools, handles retries and error recovery, and ensures data consistency across systems.
Recent industry benchmarks suggest that for a moderately complex agentic deployment involving integrations with five to ten external systems, the infrastructure costs for managing those integrations can run between five hundred and two thousand dollars monthly, separate from any usage-based charges from the systems themselves. This is because you need robust orchestration layers that can handle the unpredictability of agent behavior. Traditional integration platforms were designed for predictable workflows where you know exactly which systems will be called and in what sequence. But agents make dynamic decisions about which tools to use, meaning your integration infrastructure needs to maintain connections to all potentially useful systems, keep security credentials fresh across all of them, and be ready to handle arbitrary sequences of calls that might never have been tested during development. That operational complexity translates to higher infrastructure costs than simpler integration patterns.
The third major cost component, and one that frequently catches companies by surprise, is vector database and retrieval infrastructure. Agentic systems that work with organizational knowledge need to be able to search through documents, past conversations, policy files, and other unstructured data to find relevant information. This capability is typically implemented through retrieval-augmented generation, or RAG, which involves converting text into high-dimensional vectors and storing them in specialized databases optimized for similarity search. Every time an agent needs to search for information, it first converts the search query into a vector using an embedding model, which itself costs money in terms of API calls or compute resources. Then it performs a similarity search across potentially millions of stored vectors, which consumes database compute resources. The database returns the most relevant chunks of text, which then get fed into an LLM along with the original query to generate a contextualized response.
The cost of running these RAG pipelines scales with both the size of your knowledge base and the frequency of searches. Vector databases from providers like Pinecone, Weaviate, or Qdrant typically charge based on the number of vectors stored, the number of queries performed, and the computational resources needed to run those queries efficiently. For a mid-sized deployment with a few million vectors representing a moderately large knowledge base, and query volumes in the range of thousands per day as agents search for information to answer questions, companies are reporting monthly vector database costs in the range of one hundred to two thousand dollars depending on performance requirements and data volumes. When you combine this with the embedding generation costs, which run about one tenth of a cent to one cent per thousand tokens embedded, the total retrieval infrastructure cost can be substantial. And critically, this cost scales with how often your agents need to search for information, which is determined by the types of questions they receive and the complexity of the tasks they’re trying to accomplish, not by factors you control directly.
The fourth cost layer that often gets overlooked is monitoring and governance infrastructure. When you’re running autonomous agents that can take actions on behalf of users or make decisions that affect business outcomes, you need robust systems to observe what they’re doing, validate that they’re behaving correctly, and intervene when they start to drift into problematic behaviors. This requires logging every agent action, maintaining audit trails, running evaluations on agent outputs to ensure quality, and providing dashboards that let operations teams spot anomalies before they escalate into serious problems. The infrastructure to support this observability can be significant. Companies are investing in tooling to trace entire agent workflows from start to finish, capturing not just the final output but every intermediate step, every tool call, every decision point. This data needs to be stored, indexed, and made queryable so that when something goes wrong, engineers can reconstruct exactly what the agent was thinking and what external factors influenced its behavior.
Industry data suggests that companies serious about agent governance are spending between two hundred and one thousand dollars monthly just on monitoring and evaluation infrastructure separate from their core agent compute costs. This might seem excessive until you consider what’s at stake. An agent that starts consistently giving bad answers, or worse, taking incorrect actions in external systems, can damage customer relationships, create compliance issues, or generate financial losses that far exceed the cost of the infrastructure that would have caught the problem. As one director of AI engineering at an enterprise software company told me, “We spent six months treating monitoring as optional until an agent integration bug resulted in corrupted customer data that took three days and forty thousand dollars to fix. Now we treat observability as foundational, not optional.”
When you add up all these cost components, LLM API calls, tool integration overhead, vector database and RAG infrastructure, and monitoring and governance systems, you start to understand why comprehensive agentic deployments often run between three thousand and thirteen thousand dollars monthly in operational costs even before you reach truly massive scale. And the critical insight is that these costs are largely variable and unpredictable because they scale with agent behavior, not with easily observable metrics like user count or number of workflows deployed. This variability creates the core challenge for billing infrastructure.
The Measurement Problem: Defining Units of Work for Autonomous Systems
Once you understand the cost structure, the next challenge becomes figuring out how to measure what the agent actually does in a way that’s meaningful for pricing. This is harder than it sounds because agents deliberately abstract away the implementation details from users. The whole point of an agent is that users specify goals and the agent figures out how to accomplish them. But if we’re going to charge based on work performed, we need to define what constitutes a unit of work in a way that makes sense to customers while also roughly correlating with our costs.
The simplest approach, which many companies try first, is to define the unit as a single interaction or conversation. The customer sends a message, the agent responds, that’s one billable event. Zendesk uses this model for its AI agents, charging approximately five to fifty cents per conversation depending on complexity and contract terms. Intercom’s Fin charges ninety-nine cents per successfully resolved conversation. This has the advantage of being very intuitive for customers. They can easily understand what they’re being charged for because it maps directly to the visible exchanges they see in their interface.
But conversation-based pricing has a significant weakness when applied to autonomous agents. Not all conversations are created equal from a cost perspective. A simple question that the agent answers by retrieving a single policy document might cost the vendor two or three cents to process. A complex multi-step inquiry that requires looking up customer data across multiple systems, evaluating business rules, and generating a customized response might cost thirty or forty cents to process. If you charge a flat rate per conversation, you’re either overcharging customers for simple interactions or losing money on complex ones or most likely both simultaneously. This creates perverse incentives where vendors try to deflect complex questions to keep costs down, which undermines the whole value proposition of having an intelligent agent.
Some companies attempt to solve this through tiered conversation pricing where they charge different rates for different types of interactions. Perhaps simple informational queries cost ten cents, moderate complexity tasks cost twenty-five cents, and complex multi-step workflows cost fifty cents or more. The challenge with tiered pricing is classification. How do you determine in real-time which tier a given conversation belongs to? If you classify based on the agent’s assessment of complexity before it starts working, you might mis-classify and either overcharge or undercharge. If you classify retroactively based on actual costs incurred, customers lose the ability to predict their bills, which creates the same problems as pure variable pricing. And you still have the difficulty that complexity from a cost perspective might not align with complexity from a value perspective, leading to mis-matches between what you charge and what customers think is fair.
An alternative approach that’s gaining traction is to define the unit of work not as a conversation but as a workflow or process completed. Automation platforms like Make.com and n8n use versions of this model, charging per workflow execution rather than per individual action within the workflow. The advantage is that a workflow represents a meaningful unit of value to the customer. Automating an entire lead qualification process end-to-end is worth more than executing a single database lookup, and workflow-based pricing can reflect that. The challenge is that agentic systems don’t have fixed, predefined workflows. The whole point of the agent is that the agent dynamically constructs its execution path based on the specific circumstances it encounters. So you’re back to the same classification problem trying to define when one workflow ends and another begins when the agent is fluidly chaining together tasks.
ServiceNow has introduced an interesting variation on this theme with what they call assists, which are essentially atomic units of value delivered by their AI agents. An assist might be answering a support question, generating a piece of code, or completing a specific task like creating a ticket or updating a record. Customers purchase blocks of assists, perhaps ten thousand per month in a standard plan, and each time the agent delivers value in one of these predefined categories, it consumes one assist. ServiceNow can adjust the cost basis of different assist types on the backend to reflect their actual costs, so generating complex code might consume three assists while answering a simple question consumes one assist, but customers still have the simplicity of dealing with a single credit currency. This credit-based abstraction gives vendors flexibility to adjust underlying economics while maintaining pricing stability and predictability for customers. And it maps reasonably well to customer value because assists are defined in terms of outcomes delivered rather than implementation details.
Microsoft has taken perhaps the most aggressive approach with Copilot by simply charging a flat per-user subscription of thirty dollars monthly for unlimited usage within acceptable use policy bounds. This completely sidesteps the measurement problem by making pricing independent of usage volume or complexity. Microsoft is effectively betting that its efficiency in running these models at scale, combined with the fact that not all users will be heavy users, allows them to profitably serve the full range of usage patterns under a flat subscription. For customers, this provides maximum predictability and removes any friction around monitoring usage or worrying about bill shock. The risk for Microsoft is that heavy users can consume resources far beyond what the subscription fee covers, but they’re apparently willing to absorb that variance in exchange for simplicity and rapid adoption.
The measurement challenge becomes even more acute when you consider multi-agent systems where specialized agents collaborate to accomplish complex goals. Imagine an agent ecosystem where you have a research agent that gathers information, a planning agent that figures out what needs to be done, multiple execution agents that perform different tasks in parallel, and a coordination agent that brings everything together. When all these agents work together to accomplish a single customer goal, what exactly do you bill for? Do you charge for each agent invocation separately? That creates enormous billing complexity and makes costs essentially unpredictable. Do you charge for the final outcome delivered? That risks massive variance in your costs to deliver that outcome depending on how complex the coordination turned out to be. There’s no obviously right answer, which is why most companies deploying multi-agent systems today are opting for hybrid models that combine base subscription fees with usage-based components precisely to hedge against this uncertainty.
The Control Problem: Preventing Agents from Going Rogue (or Bankrupt)
Beyond measurement lies an even more concerning challenge: how do you prevent autonomous agents from consuming resources in ways that either damage the customer relationship or destroy your margins? When software follows predefined logic, you can test all the code paths and validate that the system behaves correctly and efficiently. But agentic systems make dynamic decisions that can’t be fully predicted or tested in advance. This creates real risk that agents might pursue execution strategies that are dramatically more expensive than necessary or that result in outcomes that aren’t worth their cost.
Let me give you a real example of what can go wrong. A company deployed an AI agent to help with customer research by automatically gathering information from various sources on the internet. The agent was given access to web search APIs and was instructed to be thorough in its research. A user asked the agent to research market trends for a specific niche product category. The agent interpreted “thorough” to mean it should gather information from hundreds of different sources. It started making web search API calls, hundreds of them, each returning multiple results that then needed to be processed by language models to extract relevant insights. Within an hour, the agent had consumed over a million tokens and made over three hundred API calls to external services, racking up nearly two hundred dollars in costs to answer a single research query. The user wasn’t even particularly pleased with the results because the agent had produced an overwhelming volume of information that was difficult to synthesize. The company ended up eating the cost and scrambling to implement guardrails to prevent similar incidents.
This type of runaway agent behavior is more common than you might think in early deployments because agents optimize for task completion, not for cost efficiency, unless you explicitly teach them to consider costs in their decision-making. And even then, the agent’s understanding of cost-effectiveness might not match your understanding. It might decide that making fifty database queries to ensure it has complete information is “worth it” for accuracy even though those queries cost real money and the incremental accuracy gain doesn’t justify the cost. Without proper controls, you can end up in situations where agents routinely take the most expensive path to accomplish their goals simply because that’s the highest probability path to success based on their training.
The first category of controls that companies are implementing involves budget caps and rate limits at multiple levels of granularity. At the coarsest level, you might set a maximum monthly spend per customer that absolutely cannot be exceeded regardless of usage. This protects you from catastrophic outcomes but creates the risk that important agent workflows get cut off mid-month when limits are hit, degrading the customer experience. More sophisticated implementations use multi-tiered limits. There might be a soft limit where warnings are triggered and customers are notified they’re approaching their threshold. This gives them a chance to either increase their budget or optimize their usage before hitting a hard limit that actually blocks further execution. Within the billing period, you might also implement per-workflow budgets that prevent any single agent execution from consuming more than a defined amount of resources. If an agent starts burning through token budgets or making excessive API calls, the workflow gets automatically terminated and escalated for human review.
The implementation challenge with these budget controls is that they need to be enforced at the orchestration layer where the agent makes its tool-calling decisions, not just at the billing layer where you aggregate costs after the fact. This means your agent runtime needs real-time visibility into costs being incurred. Every time an agent considers invoking a tool or calling a model, the orchestration system needs to check current spend against budgets and either allow the action, warn the user, or block the action and terminate the workflow. Building this kind of cost-aware orchestration requires tight integration between your agent framework, your usage metering systems, and your billing infrastructure. Many companies are discovering that their existing architectures don’t support this kind of real-time cost enforcement, requiring significant refactoring.
The second category of controls involves shaping agent behavior through configuration and training to make more cost-effective decisions. This might include defining preferred execution strategies that agents should try first before falling back to more expensive approaches. For example, you might configure an agent to always check a cache of previous answers before invoking expensive tools. If a similar query was recently answered, the agent can return that result or adapt it to the current context rather than repeating expensive research. You might also provide agents with approximate cost models for different tools and actions, allowing them to make informed tradeoffs between accuracy and cost. An agent might learn that calling a premium reasoning model costs ten times more than calling a standard model, and it should only invoke the expensive model when the task genuinely requires advanced reasoning capabilities.
Some companies are going even further and training specialized routing models whose job is to analyze incoming requests and predict how complex they’ll be to handle, then assign them to appropriately capable and priced agent configurations. Simple questions go to lightweight agents running on cheap models with minimal tool access. Complex queries get routed to more sophisticated agents with access to more expensive resources. This routing layer acts as a cost optimization mechanism that happens transparently to the user. The challenge is building the training data and evaluation frameworks to teach the router to make good decisions. If the router consistently under-estimates query complexity and assigns tasks to under-powered agents that fail, you’ve just created a poor user experience to save costs, which is likely to be a pyrrhic victory. The router needs to be tuned to find the right balance between cost efficiency and reliability.
The third category of controls, which is less technical but equally important, involves contractual protections and fair use policies that set clear expectations with customers about what constitutes reasonable usage. Your terms of service might specify that agentic features are intended for legitimate business use and that excessive usage, abuse, or attempts to extract value in ways that weren’t intended can result in throttling or even termination. For example, you might prohibit using your customer service agent infrastructure to run large-scale market research projects or to stress-test your systems. These policies give you recourse when customers behave in ways that threaten the economics of your service, but they need to be enforced thoughtfully to avoid alienating good-faith customers who happen to have legitimate high-usage scenarios.
Salesforce has implemented an interesting approach to this through what they call trust boundaries in their Agentforce product. Agents operate within defined boundaries regarding what actions they can take, what systems they can access, and what costs they’re allowed to incur. These boundaries can be configured per customer, per use case, or even per individual agent. A customer service agent might have permission to look up order details and issue refunds up to a certain amount, but it can’t access employee data or make changes to product pricing. These trust boundaries serve dual purposes. They prevent security and compliance issues by ensuring agents don’t exceed their authorized scope. But they also serve as cost controls by limiting which expensive operations agents can perform. If an agent can’t access certain premium services or can’t invoke certain costly tools, it literally cannot run up bills in those areas.
The control problem ultimately comes down to managing the tension between giving agents enough autonomy to be useful and maintaining enough oversight to prevent chaos. Companies that get this balance right tend to follow a pattern of starting with fairly restrictive controls, gradually relaxing them as they develop confidence in agent behavior and better understanding of usage patterns, and continuously iterating based on production experience. The companies that struggle are often those that either lock things down so tightly that agents can’t do anything useful or those that grant too much freedom and then get surprised by the consequences. There’s no substitute for running real agents in production with real users and real constraints to learn where the boundaries should be.
The Pricing Model Zoo: What Companies Are Actually Doing
Given all these challenges around unpredictability, cost structure, measurement, and control, what pricing models are actually emerging in practice? As of early 2026, we can identify several distinct approaches that different companies are taking, each with their own logic and tradeoffs. Understanding the landscape helps clarify what options are available and what factors should drive your choice.
The agent-as-employee model treats each AI agent as a distinct digital worker with its own role and capacity. You might deploy a customer support agent, a sales research agent, and a data analysis agent, each priced separately at a fixed subscription rate. This mirrors how you’d budget for human employees in those roles. For instance, Nullify charges eight hundred dollars per agent per year for their security vulnerability fixing agents. You’re essentially renting a specialized AI worker for a defined function. The appeal of this model is its simplicity and predictability for both sides. Customers know exactly what they’ll pay because it’s a fixed subscription. Vendors can forecast revenue reliably because the agent count is relatively stable. The model works well when agents have clearly defined, bounded responsibilities that map to recognizable job functions. The weakness is that it doesn’t reflect usage at all. An agent that processes a thousand tasks per month costs the same as one that processes ten tasks per month, which can create misalignment where customers feel they’re overpaying for underutilized agents or where vendors are underwater on heavily used agents.
The per-action or per-request model charges customers for each discrete action the agent performs. Microsoft’s Copilot is available under various pricing structures, but some enterprises pay based on Security Compute Units where each automated security action or code generation event consumes credits. Devin, an AI software engineering assistant, charges per Agent Compute Unit for development work performed. This model provides better alignment between costs and usage because customers pay for what the agent actually does. When usage is light, bills are low. When usage increases, bills scale accordingly, but so does the value delivered. The challenges with per-action pricing emerge around defining what constitutes an action when agents take multi-step approaches to accomplish goals. If an agent makes ten API calls and three LLM invocations to answer a single user query, is that one action or thirteen actions? Most companies using this model abstract it one level up, charging per meaningful outcome achieved rather than per individual operation, but that brings us back to the classification problem of determining what level of outcome was achieved.
The per-conversation or per-resolution model, which we’ve discussed already in the context of Intercom Fin and Zendesk, ties pricing to completed interactions. This maps well to customer service and support use cases where there’s a clear beginning and end to each engagement. The customer raises an issue, the agent resolves it or escalates it, that’s a billable event. Zendesk charges roughly five to fifty cents per conversation depending on complexity. Intercom charges ninety-nine cents per successfully resolved conversation, with the key word being successfully. If the agent can’t resolve the issue and needs to escalate to a human, there’s no charge. This outcome-based variation within the per-conversation model provides strong alignment with customer value and removes the risk of customers paying for failed attempts. The limitation is that this approach really only works well for conversational use cases. Agents that perform background tasks, run periodic analyses, or execute complex multi-day workflows don’t fit neatly into a conversation framework.
The hybrid model, which might be the most common in practice, combines a base subscription with usage-based components to hedge risk on both sides. Customers pay a monthly platform fee that provides access to the agent infrastructure and includes a certain amount of usage. Perhaps your base plan includes five thousand agent actions or assists or conversations per month. Beyond that threshold, you pay incremental usage fees at established rates. This gives customers budget predictability up to their typical usage levels while ensuring vendors don’t lose money on unexpectedly heavy usage. Relevance AI uses a model like this where customers select a subscription tier that includes base features and a certain number of seats, plus a pool of included credits, then pay for additional credits consumed beyond the included amount. ServiceNow’s approach with assists within tiered plans is another variation on the hybrid model.
Hybrid models can be tuned along a spectrum from mostly subscription to mostly usage-based depending on your strategic goals. If you’re focused on revenue predictability and want to minimize churn from bill shock, you weight toward higher base fees and lower usage fees. If you’re focused on growth and want to lower barriers to entry, you minimize base fees and put more weight on usage charges. The key insight is that you don’t have to pick one extreme or the other. Most companies are finding that some combination feels right because it acknowledges both the need for predictability and the reality that usage varies significantly.
The outcome-based model takes the logical end step of tying payment directly to business results achieved rather than to technical actions performed. Vantage, which helps companies optimize cloud costs, charges five percent of actual savings delivered. You only pay when Vantage saves you money, and the amount you pay scales with the value received. This creates perfect alignment but requires very clear definitions of what constitutes an outcome and how to measure it. For an agent that autonomously books qualified sales meetings, you might charge per meeting booked. For an agent that processes invoices, you might charge per invoice successfully processed. The measurement and attribution challenges are significant because you need systems to track outcomes reliably and you need agreement with customers on what counts. But when you can make it work, outcome-based pricing tends to be the most defensible from a value perspective because customers are literally paying for exactly what they want.
Looking across this landscape, a clear pattern emerges. Companies are gravitating toward models that provide a baseline of predictability through either subscriptions or included usage pools, while retaining some element of variable pricing that scales with actual value delivered or costs incurred. Pure subscription models that ignore usage entirely are rare except in cases where the vendor is very confident in their cost structure and wants to prioritize growth over margins. Pure usage-based models are also less common than you might expect because both vendors and customers have learned from early experiences with bill shock and revenue volatility that some buffering is valuable. The market is converging on hybrid and credit-based approaches that thread the needle between these extremes.
The Credit System Solution: Why Prepaid Pools Are Winning
Among all the pricing approaches we’ve examined, one pattern is becoming increasingly dominant in agentic AI: prepaid credit systems. And it’s worth understanding why this model is resonating with companies across different use cases and markets, because the reasons reveal important insights about how to design billing infrastructure for unpredictable systems.
The core insight is that credits provide a translation layer that decouples customer-facing pricing from backend costs in a way that gives both parties what they need. From the customer’s perspective, they purchase a pool of credits at a fixed price. Ten thousand credits costs five hundred dollars, or whatever the pricing is. They now have a budget that’s completely predictable. They can forecast their monthly expense accurately because they know exactly how many credits they’re buying. From the vendor’s perspective, credits provide flexibility to adjust the credit-to-cost exchange rate on the backend as costs evolve, without renegotiating every customer contract or changing list prices. When model prices drop or when you optimize your agent routing to use cheaper models, you can quietly increase how much value each credit buys, effectively passing some savings to customers while maintaining your nominal pricing. When new capabilities get added that consume more resources, you can adjust the credit cost for those specific actions without touching the overall pricing structure.
Credits also solve the unit of work definition problem elegantly. Instead of arguing about whether a particular agent action is simple or complex and therefore should cost ten cents or thirty cents, you define it in credit terms. A simple lookup might cost one credit. A moderate complexity workflow might cost three credits. A complex multi-step automation might cost ten credits. These credit costs can be calibrated based on actual resource consumption without requiring customers to understand or care about the underlying implementation. The customer just sees that their request consumed a certain number of credits, which maps to a straightforward calculation of monetary cost based on how much they paid per credit in their plan.
The prepaid nature of credit systems provides built-in budget controls that prevent runaway costs automatically. When customers exhaust their credit pool, agent operations either stop or they trigger a conscious decision about whether to purchase additional credits. There’s no scenario where someone gets a surprise bill at the end of the month that’s ten times what they expected because their agents went on a spending spree. This protection is valuable for both sides. Customers get peace of mind. Vendors avoid the awkward conversations that happen when customers receive bills they weren’t prepared for and dispute the charges or churn in frustration.
Credit systems also create better alignment around optimization. When customers have a fixed pool of credits and they’re trying to accomplish as much as possible within that budget, they’re naturally incentivized to use the agents efficiently. They’ll likely start thinking about which tasks really need autonomous agents versus which could be handled by simpler automation. They might work with you to configure agents to be more efficient in their tool usage. This collaborative dynamic around resource optimization tends to be healthier than the adversarial dynamic that can develop when customers feel like vendors are trying to maximize usage to increase bills. Credits make it feel more like both parties are on the same team trying to get maximum value from a shared resource pool.
From an implementation perspective, credit systems require some specific capabilities in your billing infrastructure that are worth calling out. You need real-time credit balance tracking that’s accessible to both your agent orchestration systems and to customer-facing dashboards. When an agent is about to take an action, the system needs to quickly verify that the customer has sufficient credits and then immediately debit the appropriate amount. This needs to happen synchronously with the action to prevent scenarios where customers overdraw their balance before the system catches up. You need detailed usage breakdowns that show customers exactly how credits were consumed. A line item that just says “used five thousand credits this month” isn’t actionable. Customers need to see that they used two thousand credits on customer service workflows, fifteen hundred credits on data analysis tasks, and so on, ideally with the ability to drill down further into specific workflows or time periods.
You need support for multiple credit pools or credit types if you’re offering different capabilities with different economics. Perhaps you sell standard credits that work for most actions and premium credits that unlock access to advanced reasoning models or specialized tools. Your billing system needs to track these separately and enforce rules about which credits can be used for which actions. You need flexible credit expiration and rollover policies. Do unused credits expire at the end of each billing period, or do they roll over to the next month? Can customers bank credits for multiple months, or is there a cap? These policy choices affect both revenue recognition and customer satisfaction, so your system needs to support whichever approach you choose.
You need the ability to grant credits as part of trials, promotions, or customer success initiatives. If you want to give a new customer ten thousand free credits to try your platform, the billing system should make that straightforward. And finally, you need clear credit-to-dollar reporting for your own finance team. While customers think in credits, your accounting and forecasting still happens in dollars, so you need the infrastructure to convert between these units reliably and to understand the dollar value of outstanding credit liabilities on your balance sheet.
Several companies have built their entire monetization strategy around credits and are providing useful examples of how to implement these systems well. ServiceNow’s assists function as credits that customers purchase in bulk. Salesforce has introduced AI Credits as a currency for their Agentforce platform where different autonomous actions consume different amounts of credits based on their computational intensity. OpenAI itself uses a credit system where customers prepay for API usage and their account gets debited for each API call based on token consumption. The commonality across all these implementations is that they’ve invested in making the credit economics transparent while hiding the underlying technical complexity. Customers don’t need to understand that a complex agentic workflow might involve vector database searches, multiple LLM calls, tool invocations, and orchestration overhead. They just need to understand that it costs fifty credits, and they can decide whether that’s worthwhile.
Infrastructure Requirements: Building Billing for the Unpredictable
Let’s get concrete about what billing infrastructure actually needs to exist to support agentic AI pricing effectively. This goes beyond just having a usage metering system. The unique characteristics of agents require some specific capabilities that traditional billing platforms weren’t designed to provide.
The foundation is event streaming and real-time aggregation at massive scale. When an agent performs a workflow, it generates a stream of events representing each action taken. It queries a database, emits an event. It calls an LLM, emits an event. It searches a vector store, emits an event. For a single user interaction that the agent resolves through a multi-step workflow, you might generate twenty or thirty distinct billable events. Multiply that across thousands of concurrent agent sessions, and you’re looking at potentially millions of events per hour that need to be captured, attributed to the right customer, categorized appropriately, and aggregated for billing. Your metering infrastructure needs to handle this volume without falling behind, losing events, or double-counting. This typically requires distributed streaming platforms like Apache Kafka or cloud-native equivalents, paired with stream processing frameworks that can maintain running aggregates in memory and periodically persist them to durable storage.
The second requirement is multi-level attribution and cost allocation. When an agent workflow involves calling multiple tools, using multiple models, and performing multiple operations, each with potentially different costs, your billing system needs to be able to attribute every cost component correctly. This isn’t just about knowing which customer triggered the workflow. You also need to know which product feature or which agent configuration was being used, because different features might have different pricing. You need to tag events with enough metadata to support detailed usage analysis later. Which workflows are most expensive? Which agents are most efficient? Which customers are using which capabilities? All of this requires careful event schema design where every billable event carries sufficient context to enable retrospective analysis and aggregation along multiple dimensions.
The third requirement is dynamic pricing rules that can be updated frequently without code deployments. When you’re using credits as an abstraction layer, the exchange rate between credits and underlying costs needs to be adjustable as your own costs change or as you add new capabilities. Your billing system should fetch pricing rules from a configuration service for each rating operation. When an agent completes an action, the rating engine queries the current pricing rules to determine how many credits it costs, rather than having those costs hard-coded in application logic. This means you can update your pricing by changing configuration, and those changes take effect immediately for new usage without requiring application updates. You also need version control for pricing rules so you can understand what pricing was in effect at any point in the past, which is critical for auditing and for handling disputes where customers question why they were charged a particular amount.
The fourth requirement, which might be the most challenging to implement, is cross-system workflow tracking. Agentic workload billing isn’t just about metering individual events. You often need to group related events into logical workflows or sessions for pricing purposes. If you’re charging per completed workflow rather than per individual action, your billing system needs to understand which actions belong to the same workflow instance. This requires coordination across your agent orchestration platform, your event streaming infrastructure, and your billing database. The typical implementation involves generating a unique workflow identifier when an agent session begins and ensuring that every event emitted during that session carries the workflow ID. Then your billing logic can group events by workflow ID and apply pricing rules at the workflow level. But handling edge cases is tricky. What happens if a workflow spans multiple billing periods? What if a workflow gets interrupted and never completes? Your infrastructure needs defined handling for these scenarios.
The fifth requirement is customer-visible usage analytics and budget controls that update in real-time or near real-time. Customers using agentic features need to see their current usage and projected costs throughout the month, not just at invoice time. This requires exposing usage data from your billing system through APIs and dashboards that customers can access. They should be able to see their credit balance, their usage rate, projections of when they’ll exhaust their credits at current consumption levels, and detailed breakdowns of what’s consuming credits. They should also be able to set their own budget alerts and spending caps. If a customer wants to be notified when they hit fifty percent of their monthly credit budget, the infrastructure should support that. And critically, all of this needs to reflect reality within minutes of usage occurring, not days later after batch processing completes. Real-time visibility is what allows customers to make informed decisions about their usage patterns and prevents the bill shock that damages relationships.
The sixth infrastructure requirement is support for credit normalization across actions of different complexity and cost. Not all agent actions are created equal from a cost perspective, but you don’t want to expose that complexity directly to customers. The billing system needs internal logic to normalize different actions to credit values that roughly reflect relative cost while remaining simple for customers to understand. This normalization layer might involve fairly sophisticated logic that considers multiple cost factors. How many LLM tokens were consumed? Which model was used? How many tool calls were made? What was the latency indicating how much compute was required? Based on these inputs, the system calculates a credit cost that’s been designed to maintain healthy margins while feeling fair to customers. This calculation happens transparently on the backend. Customers just see the credit cost, not the formula that produced it.
Companies that have built this infrastructure report that it’s a significant engineering investment, typically measured in quarters rather than weeks. But they also report that getting it right pays off through improved pricing flexibility, better customer experience, and fewer billing disputes. The companies that try to bolt agent billing onto existing subscription billing systems designed for simple per-seat or per-GB pricing usually run into limitations fairly quickly and end up either restricting what their agents can do to fit the billing constraints or embarking on the infrastructure buildout they should have done upfront.
Looking Forward: The Evolution of Agentic Pricing
As we close this examination of agentic AI pricing, let’s look ahead to where this is going. The current state is messy and experimental because the technology is so new and usage patterns haven’t fully stabilized. But we can identify some clear trends that are likely to shape how agentic pricing evolves over the next few years.
The first trend is increasing sophistication in how outcomes are defined and measured. Right now, most outcome-based pricing for agents focuses on relatively simple outcomes like conversations resolved or tasks completed. As measurement infrastructure matures and as agents handle more complex workflows, we’ll see pricing tied to higher-level business outcomes. Instead of charging per customer service interaction, charge based on customer satisfaction scores or first-contact resolution rates. Instead of charging per sales email sent by an agent, charge based on meetings booked or pipeline generated. This requires building more sophisticated attribution models and evaluation frameworks, but the companies that figure it out will be able to charge premium pricing because they’re aligning directly with what customers actually care about.
The second trend is the emergence of agent performance tiers where different levels of capability or reliability come at different price points. Imagine being able to choose whether a customer service agent should use advanced reasoning models that cost more but have higher resolution rates, or use faster cheaper models that handle most queries adequately but escalate edge cases. You might price these differently, charging more for premium agent configurations and less for standard configurations. This gives customers choice in how they balance cost versus performance, similar to how cloud providers offer different instance types. The infrastructure to support this requires more sophisticated agent routing and configuration management, but it creates valuable segmentation opportunities.
The third trend is better integration between agent orchestration platforms and billing systems. Right now, these are often separate systems that communicate through logs or event streams. We’ll see tighter coupling where the orchestration layer has native understanding of billing rules and can make agent execution decisions that factor in costs. An agent might choose a cheaper execution path when it knows it’s close to budget limits but use more expensive tools when high accuracy is critical and budget is available. This cost-aware agent intelligence will become a differentiator as agents get better at optimizing for both quality and efficiency simultaneously.
The fourth trend is the development of industry-specific pricing standards and benchmarks. Right now, every company is figuring out agent pricing independently, leading to huge variation. As the market matures, we’ll likely see convergence around standard pricing models for particular use cases. Customer service agents might typically cost between fifty cents and two dollars per resolution depending on complexity. Sales research agents might cost between five dollars and twenty dollars per completed research project. These benchmarks will make it easier for customers to evaluate offerings and easier for vendors to anchor their pricing in market norms. Deviations from the standards will need to be justified through superior capability or efficiency.
The fifth trend is increasing regulatory and accounting clarity around how to handle autonomous agent costs and revenue. Right now, there’s ambiguity in how to account for prepaid credits, how to recognize revenue for outcome-based pricing where outcomes are uncertain at time of payment, and how to disclose agent-related costs in financial statements. As regulators and accounting standards bodies catch up to the technology, we’ll get clearer guidance that reduces the uncertainty companies currently face in structuring these arrangements. This clarity will make it easier to raise capital and to negotiate enterprise contracts because both sides will have better frameworks for evaluating the economics.
The synthesis of all these trends points toward a future where agentic pricing is more standardized, more outcome-focused, more integrated with the technical systems, and more sophisticated in how it segments customers and use cases. But that future is probably still two to three years away. In the meantime, companies building agentic features face the challenge of pricing something that’s fundamentally new without established playbooks or market benchmarks. The companies that are succeeding are those that embrace experimentation, build flexible infrastructure that allows them to iterate on pricing models quickly, invest heavily in measurement and transparency, and maintain close dialogue with customers about how the pricing feels and whether it aligns with value received.
The core lesson from our deep dive into agentic pricing is this: when your software starts making autonomous decisions about how to accomplish goals, your pricing model needs to accommodate that autonomy rather than trying to constrain it. Forcing agents into pricing models designed for predictable, human-controlled software creates friction for everyone. But building pricing models that acknowledge and embrace the unpredictability of agents, while providing appropriate controls and transparency, can create business models that scale effectively and deliver value for both vendors and customers. The billing infrastructure to support this is non-trivial to build, but it’s becoming a strategic capability that separates leaders from followers in the AI-native software era.
About This Series
The Future Ahead is a series exploring where the AI industry is heading and how it will fundamentally transform billing workflows, billing infrastructure, and pricing models.
Read Previous Articles:
- Part 1: The AI Billing Infrastructure Crisis
- Part 2: The Outcome-Based Pricing Revolution
- Part 3: The Token Cost Deflation Paradox
Next in series: Part 5 - Coming soon