On this page
Here is a thought experiment.
You are an engineering executive. Your CFO walks into your office and asks: “What did we spend on AI workloads last month, broken down by team?” Your board chair calls and asks: “If our largest customer asked which third-party AI services we are using and where their data goes, what would you tell them?” An auditor asks: “Can you show me the deployment history for your fraud detection model over the last twelve months?”
Three questions. None of them unreasonable. All of them within the scope of normal executive oversight of a platform that runs AI workloads.
Most CTOs in mid-sized and large organisations today cannot answer any of them in under a week. The gap between what is being asked and what is answerable is widening, and the cost of being on the wrong side of that gap is escalating.
This post is about the inventory you should have, and the eight questions you should be able to answer in thirty seconds about your AI platform.
The Eight Questions
These are the questions that are starting to be asked in board meetings, audit conversations, customer procurement, and regulatory conversations. They are not exhaustive. They are the ones that come up first, and the ones whose answers signal whether the rest of the platform’s AI posture is likely to hold up under scrutiny.
1. How many production AI workloads do you have?
Not “approximately.” Not “I’d have to check.” The number, with the workloads named.
If the answer is fuzzy, the platform does not have an inventory. Which means everything downstream of an inventory - cost control, ownership, governance, audit - is also fuzzy.
2. Who owns each one?
For each production AI workload, there is a named team and a named senior individual who is accountable when it fails. Not “the AI team.” A specific team with a clearly bounded responsibility.
This question separates platforms with mature operating models from platforms with collective ambiguity. Most AI workloads start life in an organisationally uncertain space - the model team built it, the platform team operates it, the application team integrates with it. Without a deliberate ownership decision, no one is accountable, and the first incident is when the gap becomes visible.
3. What does each one cost, and which team’s budget does it sit in?
Per-workload cost attribution, including the GPU and inference compute, the supporting services, and any third-party API charges. Mapped to a team or product line.
If the answer is “it’s all in the central cloud budget,” the organisation does not have AI cost discipline. Which means the cost will keep growing, the conversation with finance will keep getting harder, and the decisions about which AI investments to continue will be made on the basis of vibes rather than data.
4. Can a new model get to production without platform team involvement?
Yes or no. If yes, what is the path, and how is it governed? If no, what is the bottleneck, and what is the typical lead time?
The answer here predicts whether the AI investment will scale. If every AI workload requires platform team handholding, the team is the bottleneck, and the throughput of AI delivery is capped at the platform team’s capacity to support it.
5. What is your AI gateway, or do you not have one?
An AI gateway is the request routing, logging, throttling, and policy enforcement layer in front of model endpoints. It is to AI services what an API gateway is to traditional services.
If the answer is “we don’t have one,” it means traffic to AI workloads is not being centrally observed, governed, or controlled. The implications - cost visibility, audit, security, fallback patterns, vendor switching - are all weaker than they need to be.
6. What is your incident response model for AI workloads?
When an AI workload fails, what happens? Who gets paged? What runbooks exist? What is the rollback story?
AI workloads fail in ways that traditional services do not - model artefacts that did not download, GPU memory pressure, model outputs that are technically successful but wrong. The platform’s existing incident response patterns do not cover most of these. The question is whether the platform has been deliberately extended for AI-specific failures, or whether the first incident will be a discovery exercise.
7. What audit trail do you have for model deployments?
For every model in production, you should be able to answer: what version is running, who deployed it, when, what change preceded the deployment, and what evidence exists that it was tested.
If the answer is “we have git history for the deployment manifests,” that is the start of an audit trail but not a complete one. The model itself, the training data that produced it, the evaluation that justified it, and the approval that authorised production deployment are also part of what auditors are starting to ask about.
8. What is your rollback story for a bad model?
If a model is producing wrong outputs in production, how quickly can you revert? Hours? Minutes? Have you tested the rollback path?
The honest answer for most organisations is “we have never tested it.” Which means the rollback path exists in theory but not in operational reality. The first production incident is the test, and the cost of failing the test is higher than the cost of running a deliberate drill.
What the Answers Reveal
The eight questions are not equally weighted. They group into three categories, and the gaps in each category signal different problems.
Questions 1, 3, and 7 are about visibility. If you cannot answer these, the platform does not have the inventory and instrumentation to support oversight of AI workloads. Everything else flows from this.
Questions 2, 4, and 5 are about operating model. If you cannot answer these, the platform is operating AI workloads through ad hoc patterns, with unclear ownership and no governance surface. The first major incident or regulatory question will expose the gap.
Questions 6 and 8 are about resilience. If you cannot answer these, the platform has not been tested against the failure modes AI workloads actually produce. The first production failure will be more expensive than it needed to be.
A CTO who can answer questions in one category but not the others has a partial AI platform. A CTO who can answer all eight has the executive visibility to operate AI workloads at scale.
What to Do if You Cannot Answer Most of Them
This is the more common position. The honest response is to treat the inventory work as a focused engineering project, not as something to be added to the platform team’s existing backlog.
A useful sequence:
- Week one: produce the inventory. Every production AI workload, every model, every endpoint, every external AI service in use. The list does not have to be perfect; it has to exist.
- Weeks two to four: assign ownership to each entry. A team and a senior accountable individual. This conversation surfaces the ambiguities that have been deferred since the workloads were deployed.
- Weeks four to eight: build cost attribution. Per-workload, per-team, per-environment. Most of the work is in cleaning up the existing infrastructure tagging, not in new tooling.
- Months two to four: build the audit and rollback evidence. For each workload, what was deployed, when, by whom, and how to revert.
- Months three to six: introduce the missing platform capabilities. AI gateway, incident runbooks, ownership-driven on-call rotations.
This is roughly six months of focused work for a mature platform team, more for an immature one. It is shorter than the cost of doing it under pressure after a regulatory question, a major incident, or a board-level cost surprise.
The Executive Conversation
The question this changes is the one CTOs are being asked about AI more frequently: “Are we ready?”
The honest answer for most organisations today is some version of “we have AI workloads in production, but the platform that supports them was not designed for AI as a first-class concern.” That is fine, and it is recoverable. What is not recoverable is the credibility lost when a CTO is unable to answer basic questions about their AI platform in front of a board, an auditor, or a customer.
The eight questions are a self-assessment, not an indictment. The point is not that every organisation should be able to answer them today. The point is that they are the questions you will be asked, and the work to be able to answer them is concrete, scoped, and worth doing on a timeline you choose rather than a timeline imposed by external pressure.
The Takeaway
If you cannot answer these eight questions in thirty seconds about your AI platform, you have a visibility gap that is going to become a credibility gap, on a timeline measured in months rather than years.
The work to close the gap is not large in engineering terms - inventory, ownership, attribution, audit trail, incident response, rollback testing. It is, however, work that has to be done deliberately. AI workloads landing on platforms that were not designed for them tend not to produce this visibility as a side effect.
The CTOs who can answer all eight questions next year are the ones who started the inventory work this year. The ones who cannot are the ones who waited for the questions to be asked before they began.
If you want help working out which of the eight you can actually answer right now, we can do that together.