Every telecom operator I have worked with has an AI cost reduction story. Most of them sound the same.
The program launched with ambition. The pilot delivered promising numbers. The board approved the budget. Then somewhere between the pilot and production, the numbers stopped moving the way the vendor promised they would.
I have been on both sides of this. Building the programs. Watching them stall. Occasionally getting them right.
Here is what I have actually seen work, and what keeps getting in the way.
It starts with data. It always starts with data
Before we wrote a single AI prompt or deployed a single model, we spent weeks trying to understand what data we had, where it lived, and whether it was consistent enough to be useful.
It was not.
CDRs from different network elements in different formats. Network performance data sitting in siloed OSS systems that had never been integrated. Customer records spread across CRM, billing, and provisioning platforms with different identifiers for the same subscriber. Years of data entered by different teams with different definitions of the same field.
The vendor deck had one slide on data readiness. We needed three months.
If you are planning a telecom AI programme right now — double your data preparation timeline. Whatever you think it will take, double it. The operators who skip this step spend more time debugging data quality issues in production than they ever spent on model development. This is where most programmes quietly die before they start.
Too much expectation and freedom to AI
When we deployed Agentic AI for network operations, the early results were exciting. The system was handling fault triage volume we never expected — flagging anomalies, correlating alerts, routing tickets. Then SLA compliance started slipping in specific fault categories.
The problem was not the model. We had not been specific enough about where the AI decides alone, where it recommends and a human confirms, and where a human always takes the call.
In a network operations environment that distinction is not theoretical. An agent autonomously rerouting traffic or escalating a P1 fault without the right human in the loop creates risk that no efficiency gain justifies.
That sounds obvious written down. In practice, in a fast-moving deployment with commercial pressure, those boundaries get blurry. We had to stop, map every decision type explicitly across the fault management workflow, and rebuild the governance layer before continuing.
That governance work — not the AI work — is what eventually delivered the outcome.
Vendor silos
Telecom operations does not run on one platform. It never has. OSS from one vendor. BSS from another. Network management from a third. Cloud infrastructure from a hyperscaler. Analytics platform from somewhere else.
Getting those technology partners to share data, align on integration standards, and work within a single AI layer was harder than any model problem we solved.
Every vendor had a different API. Different data formats. Different commercial incentives to keep you dependent on their stack. One vendor’s definition of a resolved fault ticket was different from another’s. One platform’s timestamp format was incompatible with the next. These sound like small things. They compound into serious problems when an AI agent is trying to make decisions across all of them simultaneously.
Nobody mentioned this in the original pitch. We figured it out the hard way, and it cost us months.
The cost reduction came from eliminating work, not eliminating people
This is the one I want every telecom leader to hear clearly.
We did not start by asking how many people we could remove from the NOC or the customer operations centre. We started by asking which work was consuming the most human time for the least human value.
First-line fault triage — where a NOC engineer spent 40% of their shift acknowledging alerts that cleared automatically within minutes. Billing query handling where agents answered the same five question types repeatedly from a knowledge base any AI could navigate. Routine escalations where the escalation path was defined, the information was available, and the only reason a human was involved was because no system was smart enough to handle it end to end.
AI absorbed that work. The engineers and agents who were doing it moved to complex fault diagnosis, exception handling, and process improvement work that actually needed a thinking human. The cost reduction followed from eliminating the low-value work, not from eliminating the people doing it.
Telecom leaders who go into AI programmes with headcount reduction as the primary target usually end up with neither the savings nor the operational outcomes. The technology becomes a threat rather than a tool, adoption suffers, and the programme stalls.
What actually made the difference
1- Someone owned the cost line commercially. Not a project manager. Not an IT lead. Someone whose performance review included the operational cost number we were trying to move. That person asks different questions in vendor meetings. They make faster decisions when pilots need to scale or stop. They do not let the program drift.
2- We proved it in 90 days on one workflow before expanding. One process, first-line network fault triage. Clear baseline of how long it took and what it cost. Clear measurement of what changed after AI deployment. An undeniable result that the finance team could verify independently. That proof point funded every subsequent phase. Without it we would still be in a pilot.
3- We had someone who spoke both languages. Not purely technical. Not purely operational. Someone who could sit with the NOC team, understand what was actually breaking down in the fault management workflow, and translate that into something an AI system could act on. In telecom specifically where the domain knowledge required to understand what a CDR anomaly actually means for revenue, or what a specific alarm pattern means for network health, that translation skill is rare and genuinely valuable. Most organisations do not have this person. Finding or developing them is more important than any model selection decision.
The summary
The outcome was real. The path was messier than any vendor deck showed us.
The data work, the governance decisions, the vendor integration battles, the 90-day proof points that is where the actual transformation lives in telecom AI. Not in the technology. In the decisions around it.
Telecom operators who understand this going in will move faster, waste less, and recover cost more reliably than the ones chasing the headline number from a vendor slide.
The 35% is achievable. Just not the way anyone presents it.

Thanks for sharing this and I can completely relate. I have gone through the similar situation. In the beginning, vendor promise the stars, but after a few months, the reality hits. Instead of building value, it starts to feel like our infrastructure is just their experimentation ground. I’ve been trying to navigate this for the past year, and honestly, we’re still stuck with only 1–2 basic use cases actually working.
Manas, “experimentation ground” is exactly the right phrase. The vendor carries no risk. You carry all of it.
One honest question worth asking are those 1–2 use cases genuinely in production with measurable outcomes? That answer usually tells you whether you have a foundation or need to reset.
What domain are they in : network, customer care, or billing? Happy to share what has worked from this point.