Cyber Resilience Usually Fails in the Gaps Between Teams

Apr 17

A lot of organisations still talk about cyber resilience as though it is something they can buy, but it isn’t. You can buy backup infrastructure. You can buy detection tooling. You can buy identity controls, orchestration platforms, forensic capability and every shiny dashboard the market wants to sell you. Even with all those things in place, if IT Operations, Security Operations and the line-of-business do not know exactly how they work together during a destructive cyberattack, then your resilience is far weaker than you think.

In most organisations, cyber resilience does not fail because of a total absence of technology, it fails because the operating model falls apart under pressure. The real weakness is usually not the control you forgot to buy. It is the handoff you forgot to define.

The problem is not the technology. It is the seams.

Destructive cyberattacks do not care about your org chart. The attacker does not stop at the edge of Security Operations and politely wait for IT Operations to take over. They move across identity, endpoints, infrastructure, service management, hypervisors, cloud, backup platforms, cloud platforms and business processes as one connected attack surface, yet many organisations still respond as though security investigates, IT restores and the business waits for updates. That isn’t cyber resilience, it’s a relay race with a high probability of dropping the baton.

Security Operations may understand the adversary, the attack path and the persistence mechanisms; IT Operations may control the systems, the infrastructure dependencies and the recovery processes; but the business is the only part of the organisation that can tell you what actually matters, what can wait and what level of degradation is tolerable.

yber resilience lives in the space where IT Operations, Security Operations and the business actually meet. If that space has not been built before the crisis, it will be built during the crisis: badly, politically and with dangerous gaps. In my experience leading a team that has worked with hundreds of organisations through ransomware encryption and nation-state or proxy-led wiper attacks, that failure to define the overlap in advance is one of the biggest reasons recovery takes far longer than it should.

Most organisations are still confusing ownership with coordination

One of the laziest habits in cyber resiliency I’ve seen is the constant search for a single clean-cut owner.

Who owns cyber resilience?
Who owns incident response?
Who owns cyber recovery?

Those are not questions without purpose, but they are often asked in the wrong way. The obsession with ownership can become a substitute for building the actual effective operating model. Cyber resilience is not improved by deciding that one team “owns” the problem and everybody else becomes a dependency, that’s just a neat way of hiding fragility behind governance.

Resilience is an emergent organisational property: iIt depends on coordination, decision rights, sequencing and shared understanding under pressure. You do not get that by declaring one function accountable and hoping the rest line up when things go wrong. That is a bit like saying one crew member “owns” keeping your ship afloat while everyone else carries on with whatever their day job is. It sounds neat-and-tidy until you hit an iceberg.

The handoff problem is where good intentions go to die

In real incidents, failure usually comes from the gaps between teams rather than incompetence within them. Security Operations identifies suspicious activity and starts building a picture of the intrusion; IT Operations is under pressure to contain disruption and recover services; the business starts demanding timelines; Senior executives want confidence; regulators may want answers; and customers want your products and services.. That is where the cracks appear.

Guidance gets passed verbally, assumption replaces evidence, recovery begins before remediation, systems come back online before identity can be trusted, teams optimise for speed because the pressure is unbearable, then act surprised when the attacker reappears and there is further impact.

This is where many organisations fool themselves. They think they are failing because the attack was sophisticated, who many breach notifications have your received yourself as a customer that start with the phrase. Often they fail because their internal coordination was amateur.

There is an almost reassuring predictability to breach notifications we all receive when our data is lost by a vendor whose products or services we’re a customer of. Whatever actually happened, the customer letter nearly always insists the attack was “sophisticated”. It is as if the legal and PR teams have a template that automatically translates “we had poor control over identity, logging, segmentation, response and recovery readiness” into “a digital Moriarty used complex techniques”. After being involved as a retained incident responder in thousands of attacks, I can say that many of them are not particularly sophisticated. The more common issue is that organisations still approach incident response and recovery in an overly simplistic way, reducing a complex trust-restoration problem into a basic restore-and-reboot exercise.

Organisations I work with are not amateur in terms of effort, but often they can be amateur in terms of their operating discipline.

If the business is only getting updates, your model is already broken

One of the most common design flaws in cyber resilience is treating the line-of-business as an audience rather than a participant.

The business is not there merely to hear bad news and complain about downtime, it is at the table because it is the only group that can define what matters most in operational terms.

Which services are truly critical?
Which services support revenue, safety, regulatory obligations, contractual commitments or customer trust?
What is the Minimum Viable Company?
Which services must return first, and under what conditions can they return safely?

Technical teams alone cannot answer those questions properly. Left to themselves, they often prioritise what is easiest to restore, what is most familiar or what is generating the loudest internal noise. That is how organisations end up restoring lots of infrastructure while still failing to restore business capability. A lot of apparent recovery progress is theatre: systems are up, but the organisation is not resilient, it is just busy.

Speed is useless if you recover back into compromise

This is another uncomfortable truth: fast recovery is not the same as good recovery. If Security Operations, IT Operations and the business are not aligned on trust criteria, recovery gates and sequencing, then pressure from the business can easily drag the organisation into unsafe decisions that result in further downtime.

Bring back core services too early and you risk reintroducing persistence. Bring back identity before it is trustworthy and you contaminate the rest of the recovery. Restore data without understanding tampering, staged persistence or attack-path dependencies and you may simply be rebuilding the same problem at speed. Bring back evaded security controls and you’ll be blind to the inevitable reinfection.

This is why shared responsibility matters so much., it creates a structured way to argue about the right things before a destructive cyberattack forces those arguments into the open.

Security Operations should be able to say, with authority, that the investigation is not yet mature enough to support recovery. IT Operations should be able to explain the practical implications of rebuild versus restore. The business should be able to define what level of risk and degradation is actually tolerable. Without that model in place, decisions do not become simpler and faster, they just become more political.

Most “mature” organisations are still running on heroics

I split my time between leading a team that helps customers respond to and recover from destructive cyberattacks, and leading a consulting practice focused on improving their readiness before those attacks happen. A big part of that work is assessing baseline operational cyber resilience across governance, people, process and technology. I have had the privilege of doing that with some of the largest organisations in the world, including some with enormous security teams and very substantial budgets. My takeaway,: a lot of cyber resilience programmes look better on paper than they do in practice.

They appear mature because they have Gartner “top-right” products, governance forums and named owners, but when a real incident hits, the whole thing depends on a handful of experienced people performing heroics to work around the mess.

They’re the individuals who know which systems really matter.
They’re the individuals who know which dependencies are undocumented.
They’re the individuals who know which team actually makes the decision, regardless of what the RACI says.
They’re the individuals who know who to phone when the formal process becomes useless.

That is not maturity, that’s survivable dysfunction. A proper shared responsibility model turns tribal knowledge into operational capability. It defines handoffs, decision rights, escalation paths, dependencies and criteria for moving from one phase of the incident to the next, it makes resilience less dependent on memory, personality and “cybersecurity improv”. When you have this, this is when cyber resilience starts becoming real.

The biggest CYBER resilience weakness in many firms is ambiguity

Organisations often say they want flexibility, but what they really have is ambiguity.

Ambiguity about who decides when containment is sufficient.
Ambiguity about who authorises recovery.
Ambiguity about whether the business can overrule risk concerns.
Ambiguity about which team owns validating that a restored service is actually trusted and functioning appropriately.
Ambiguity about whether lessons learned are supposed to improve the system or simply close the paperwork.

Ambiguity can feel manageable on calm days, during a destructive cyberattack it becomes fuel for delay, friction, duplicated effort and bad decisions. A shared responsibility model is not bureaucracy for its own sake. It is what stops confusion becoming impact.

You cannot buy your way around a broken operating model

This is why I remain sceptical when cyber resilience conversations rush too quickly towards products. That may sound slightly odd coming from someone who works for a data management vendor, don’t get me wrong, good technology does matter. A strong data management platform can make a material difference to workload coverage, speed of action and the efficiency and effectiveness of both response and recovery activity. What it cannot do is fix dysfunctional collaboration. No product can create a meaningful shared responsibility model between Security Operations, IT Operations and the business. That is one of the reasons the company I work for invests in proactive consulting services to help organisations address exactly that problem. You can have an excellent data management solution, but if there is no organisational discipline around how it will be used during a crisis, recovery to a trusted state will take far longer than it should.

This is usually the bit that gets neglected because it is harder than buying software and far less glamorous than talking about AI, automation or the latest detection capability, When a real cyber crisis lands, glamour has very little survival value: clarity does, coordination does, defined decision rights do, practised handoffs do. That is what real cyber resilience looks like.

Final thought

If you want to understand why some of the organisations my team has supported have absorbed destructive attacks far better than others, do not just look at the tooling. After all, they were using the same data management platform.

Look instead at how well they worked together and how prepared they were before the crisis began. Look at whether Security Operations, IT Operations and the line-of-business share a common understanding of priorities, trust criteria, decision rights and recovery sequencing.

Cyber resilience is rarely destroyed by the absence of effort. It is destroyed by confusion at the exact moment the organisation can least afford it. In many organisations I’ve assessed the operational capability in, the biggest cyber resilience gap is not in the technology stack, it is in the space between the teams.

Cyber ResilienceShared Responsibility ModelOperational Maturity

Jimmy Blake