Two AI Systems, Same Output, Completely Different Economics

How your AI architecture determines whether costs scale with you or against you

In Brief

The implementation type you choose — LLM-based, ML-based, or composable — produces a different cost curve. Two technically similar systems can generate fundamentally different financial trajectories over three to five years.
Frontier LLMs are the right tool for low-frequency, complex tasks. They are the wrong tool for high-frequency operational decisioning. The proof of concept economics look fine. The production economics tell a different story.
Every vendor contract is also a decision about future switching cost. Most businesses negotiate without knowing their own walk-away cost. Their vendors do know it.
Talent is the investment case. The location decision for a permanent ML team has a larger impact on the investment case than most technology choices.

In Detail

A payments business I know has been running AI-assisted decisioning for eight months. The platform is live, the team is pleased with the outputs, and the board deck references it as evidence of the business's technical maturity. When the CFO was asked what each decision costs — not the annual platform fee, but the unit cost per decision — the team couldn't answer. When asked how that cost changes at three times current volume, no one in the room could answer either.

This is not unusual. It is, in my experience, close to the norm.

AI investment decisions are being made as technology choices when they are, structurally, capital allocation decisions. The implementation architecture you select determines not just what you spend in year one but how your cost base behaves across a three-to-five year horizon. Two technically similar systems — both described as "AI-powered," both capable of producing the same output — can produce fundamentally different financial trajectories depending on how they are built. The businesses that understand this going in make better decisions and negotiate better contracts. The ones that don't are managing surprises they could have modelled.

What You Approved Is a Cost Curve, Not a Cost Figure

Most AI investment cases are presented as an annual spend: platform licence, implementation, headcount, support. The finance function approves a number. What it actually approves is a cost curve — a mathematical relationship between volume, time, and expenditure that will play out over years regardless of whether it was ever written down.

The shift from fixed-cost legacy infrastructure to consumption-based AI pricing is not inherently good or bad. Legacy infrastructure was expensive to build and cheap to operate at scale. Modern AI services often invert that profile: low entry cost, variable operating cost that scales with usage. For businesses with high transaction volumes or high decision frequencies, the difference between these two structures is not academic. It determines whether the AI investment makes economic sense at scale or only makes sense in a slide deck.

There is a second problem that compounds quietly: vendor dependency. A platform that costs £80,000 to migrate away from in year one — because the integration is shallow, the data schemas are simple, and the team that built it is still employed — can cost £450,000 or more to migrate in year five. Integrations have deepened. Data has accumulated in proprietary formats. The institutional knowledge of why specific architectural decisions were made has drifted or left. The vendor knows this. Their pricing at renewal reflects it. The buyer, in most cases, has not modelled it.

This is an information asymmetry that consistently benefits the vendor. It is also entirely avoidable.

The LLM Trap

Using a frontier large language model for high-frequency operational decisions is an architectural choice with unit economics consequences that most businesses have not run before committing to it.

Take a payments business making 5,000 operational decisions per day: fraud scoring, routing optimisation, compliance flagging, counterparty risk assessment. At current frontier LLM pricing — factoring in input tokens, output tokens, and the prompt engineering overhead required to get consistent structured outputs — the annual cost of running those decisions through a general-purpose LLM sits somewhere between £800,000 and £1.4 million, depending on model choice and prompt design. The same volume of decisions handled by a well-trained proprietary ML model, running on standard cloud compute, costs between £40,000 and £90,000 annually once the model is in production.

That is not a marginal difference. It is an order of magnitude.

This is not a criticism of LLMs. For low-frequency, high-complexity tasks — synthesising regulatory guidance across multiple jurisdictions, generating structured analysis from unstructured documents, drafting board-level scenario narratives — frontier models are the right tool. The economics work at low volume. The capability per call justifies the cost. But operational decision-making at transaction frequency is not a low-volume, high-complexity problem. It is a high-volume, repeatable problem, and it should be treated as an engineering and economics question, not a technology preference.

The pattern that generates the most expensive surprises is this: a team uses an LLM during a proof of concept because it is fast to integrate and produces impressive demos. The business case gets approved on the back of the proof of concept. The LLM architecture then drifts into production. Nobody models what the unit economics look like at ten times the pilot volume because the decision has already been made. By the time the cost curve becomes visible on a management account, the switching cost of moving to a proprietary model has grown substantially.

The three questions that determine architecture are: what is the decision frequency, what is the acceptable cost per decision, and what is the required latency? Technology preference is not in that list.

The Composable Path

For most payments and fintech businesses in 2026, the pragmatic answer is not to build from scratch and not to outsource entirely. It is a hybrid: buy the infrastructure layer, build the proprietary decisioning layer on top.

The infrastructure layer covers things like data streaming, event processing, and model serving — the plumbing that gets data to the right place at the right time. This is not where competitive advantage lives. It is expensive to build well, well-solved commercially, and not worth internal engineering resource unless you have a genuinely unusual scale problem. Buy it.

The proprietary decisioning layer is the models and logic that determine what happens when the data arrives. This is where competitive advantage lives. A fraud model trained on your transaction data, your customer behaviour, your counterparty network will outperform a generic model. It is also an asset — it improves over time, it is not transferable to a competitor, and it does not require renegotiating a vendor contract to iterate on. Build this.

The cost structure this produces is predominantly fixed, with a contained and predictable variable component. The infrastructure layer scales well — ten times the transaction volume on the same platform costs marginally more, not ten times more. The proprietary model layer, once in production, costs roughly the same to run regardless of volume, with compute costs scaling modestly. This is a cost structure that rewards growth rather than penalising it.

The composable approach also preserves optionality. When the infrastructure vendor needs to change — because a better alternative emerges, or the current one reprices aggressively at renewal — the proprietary decisioning layer is architecturally separate. The migration cost is bounded. That boundary is worth money, and it should be calculated at the point of the original vendor decision.

Lock-In Is a Financial Position

Switching cost is not a risk to note in the risk section of an investment case and then ignore. It is a financial position that should be quantified upfront and factored into the vendor decision.

The methodology is straightforward. At the point of initial deployment, estimate the cost of migrating away from each vendor component in year one, year three, and year five. Year one: the integration is shallow, the team is intact, the data formats are understood. Year three: integrations have deepened into adjacent systems, the original implementation team has partially turned over, bespoke configurations have accumulated. Year five: the platform is embedded into operational workflows, proprietary data formats and APIs are referenced across multiple systems, migration requires a dedicated programme with its own budget.

This schedule should sit alongside the standard total cost of ownership in the investment case. Not as a risk disclosure — as a financial output that informs the vendor decision. A vendor whose architecture has high migration cost accumulation over time is a more expensive vendor than their annual licence fee suggests. A vendor whose architecture keeps switching costs low is providing implicit financial value beyond their feature set.

The negotiation implication is direct. A business that has modelled its own switching cost trajectory knows its walk-away cost at renewal. It knows, in advance, at what point the vendor's renewal pricing makes migration economically rational. That knowledge changes the renewal conversation fundamentally. The business is no longer renegotiating from a position of dependency it hasn't quantified. It is negotiating from a position of a known financial alternative.

Talent Is the Investment Case

The single most important variable in an AI investment case for an operational ML capability is not the technology platform, the cloud provider, or the LLM vendor. It is the permanent team required to build and run it.

A live ML decisioning function requires, at minimum: data engineers to manage the feature pipelines, ML engineers to build and maintain models, and some form of model risk oversight to ensure the outputs are behaving as expected. In a regulated payments or fintech context, that oversight function is not optional — it is the thing that keeps the system's outputs defensible under scrutiny from a regulator or an audit.

This team, once assembled and operational, typically represents 50 to 55 percent of the total annual operating cost of the capability. The technology costs — cloud compute, platform licences, tooling — are the remaining 45 to 50 percent. If the finance function is reviewing an AI investment case and the talent cost is not the dominant line, the model is wrong.

The location decision for this team is, consequently, the most consequential financial variable in the investment case. A senior ML engineer based in London costs, fully loaded, approximately £140,000 to £180,000 per year. An equivalent capability in a competitive nearshore market — Poland, Portugal, Romania — costs £60,000 to £90,000. Offshore in South Asia, £25,000 to £45,000. Across a team of five to seven people, that differential can represent £300,000 to £600,000 of annual operating cost. That number dwarfs most platform licensing decisions.

This does not mean offshore is always correct. Coordination overhead, quality of output in complex problem domains, regulatory constraints around where data can be processed, and the knowledge continuity risk of high-turnover markets all factor into the decision. The point is that the decision should be made on those terms — explicitly, with numbers — rather than defaulting to London because that is where the headquarters is and where everyone the leadership team knows happens to work.

The payments business from the opening of this piece will, eventually, get to unit cost visibility. When they do, they will find that their architecture has committed them to an economics regime they did not consciously choose. The platform they selected for its feature set and implementation speed will turn out to have a switching cost profile that makes it effectively permanent. The frontier model they are using for a decisioning problem that does not require frontier capability will cost them significantly more at three times volume than their current run rate suggests.

None of this is inevitable. It is the result of treating an AI investment decision as a technology evaluation rather than a capital allocation decision with a cost curve attached. The finance function's role in this is not to slow down AI adoption or to demand that every proof of concept comes with a five-year switching cost schedule. It is to ensure that when an architecture gets approved, the business understands what it is actually approving — and that someone, somewhere, has modelled what the economics look like when it works.

AI Is Breaking the Zero Marginal Cost Assumption in Payments and Fintech ›