Your model bill is a black box. Here's how to open it.
AI spend moved from per-seat licenses to consumption in about two years — and consumption broke the way finance sees software cost. This is what a FinOps-for-AI practice actually needs, and a playbook you can start this quarter.
Two years ago, "AI cost" meant a row of $20–60/month seats for a chat assistant. Today it's a consumption line that scales with tokens — and it's the fastest-growing item in many engineering budgets. Managing AI spend went from a niche concern to near-universal: in the 2026 State of FinOps survey, ~98% of organizations now manage AI spend, up from ~31% the year before — the fastest cost-discipline adoption the survey has recorded. Meanwhile the majority of enterprises are still increasing their AI budgets, not trimming them.
So the bill is big, growing, and — for most teams — unexplained. The reason is structural.
Why consumption broke visibility
Per-seat software is easy to govern: you can count seats, map them to teams, and forecast next year by headcount. Consumption AI doesn't work that way. Cost scales with how much each workload runs, not how many people are licensed — and a single engineer with an agentic coding tool can out-spend a whole department of seats in a week.
Three things make it opaque:
- Bills aggregate at the account level. Your Anthropic or OpenAI invoice is one number per key or org — no team, ticket, feature, or cost-center attached.
- Spend is spread across providers and tools. Anthropic, OpenAI, Bedrock, Azure OpenAI, Cursor, Copilot — each with its own console and its own units. No native view unifies them.
- Token accounting is subtle. Cached reads bill at a fraction of base input; collapse them into one "input tokens" number, as naive trackers do, and you can overstate a cached workload by 5–10×. If the starting number is wrong, everything downstream is.
The question finance actually asks — "what is AI costing us per team, per feature, per engineer?" — has no answer in a provider console or a generic cost dashboard.
What a FinOps-for-AI practice actually needs
Cloud FinOps matured around a loop: inform → optimize → operate. AI spend needs the same discipline, adapted to consumption. Four capabilities, in order:
1. Attribute — to the work, not just the team
You can't govern what you can't allocate. The bar isn't "spend by team" (useful, but shallow); it's spend mapped to the work item — the ticket, the feature, the engineer — by joining provider usage to your work tracker (Jira, Linear, GitHub). That join is the hard part, and it's what turns "the AI bill went up" into "the checkout refactor cost $X."
Insist on honesty about confidence. A per-ticket number that's secretly a guess will burn your credibility with finance the first time it's wrong. Good attribution carries a fidelity tier on every dollar — "this one's exact, this one's team-level ±15%."
2. Forecast — and prove the error on your own data
Allocating the past is exact arithmetic. Forecasting the future is a range — and the only honest way to earn trust in a forecast is to back-test it on your own delivered work (leave-one-out: predict a completed item as if you hadn't seen it, measure the miss). A forecast with a measured error bar beats a confident point estimate every time, and it's what lets finance budget a quarter instead of reacting to an invoice.
3. Govern — budgets that flag the breach before month-end
Attribution plus forecast lets you put a budget on a body of work and pace it in real time: project the end-of-period spend from the current run-rate, and surface the breach weeks before the invoice, with a date — not after the fact. That's the difference between a report you read and a system you act on.
4. Optimize how you pay — not just what you run
Most teams pay every token at on-demand rack rate. But AI vendors price like cloud: on-demand, committed-spend discounts, and provisioned throughput (PTUs, reserved capacity). Commit to your steady baseline and you cut the unit price; over-commit and you forfeit the unused part. The right commitment depends on knowing which of your workloads are actually steady — which, again, comes back to attribution. There are also workload-level levers: prompt caching for repeated context, batch APIs for latency-tolerant jobs, and right-sizing the model to the task.
U* = (provisioned $/hr) ÷ (on-demand $/token × tokens/sec × 3600). Below it, stay
on-demand; above it, reserve. You can sketch your own committed-spend savings with our
savings calculator.
A playbook for this quarter
- Get one honest number first. Reconcile your AI spend to the invoice, cache-aware. Don't build on an inflated token count.
- Join spend to work for one team. Pick your heaviest AI consumer and attribute their spend to tickets/features. The first time someone sees "this feature cost $X," the program sells itself.
- Forecast next quarter and back-test it. Quote the measured error, not a vendor benchmark.
- Put a budget on one program and turn on pacing, so you get a projected-breach date.
- Only then, optimize how you pay. Decide on-demand vs. committed at the portfolio level after you have ~30+ days of attributed traffic — not on day one.
The honest part
There's no credible industry benchmark for "how much FinOps-for-AI saves," and you should be suspicious of anyone who quotes one. The number that matters is the one measured on your own traffic — your attribution coverage, your forecast error, your realized commitment savings. Treat any tool that asserts savings it can't measure with the skepticism it deserves.
That's the whole posture behind Outlay: attribute AI spend to the work, forecast it on your own delivered history, govern it with budgets and pacing, and advise on how to pay — all read-only and metadata-only, so prompts and keys never leave your environment. If you want to see it on your own numbers, start a read-only pilot or estimate your savings.
Open the box on your own spend.
A read-only, metadata-only pilot maps your real AI spend to the work, back-tests a forecast, and sizes how to pay — in about two weeks.