How does a statistical error from the Second World War apply to the billion-dollar problem of protecting data in a Large Language Model?
Let me tell you a story.
In the height of the war, the British Royal Air Force faced a terrifying problem. Their planes were getting shot down in startling numbers. So they asked a team of statisticians and engineers for help. When the returning bombers landed, the team carefully inspected every single plane, logging where the bullet holes were.
They found that the holes clustered on the wings, the fuselage, and the tail.
The initial conclusion was unanimous: “We need to put extra armor on the areas with the most bullet holes! That’s where the planes are getting hit.”
But then, a brilliant statistician named Abraham Wald stepped in. He looked at the data and said, “Gentlemen, you have it completely backward”.
The AI security parallel
This is where your IT department comes in. You’ve deployed an enterprise LLM, which is your monitored plane. And because you’re a responsible company, you put up digital armor around it:
- You’ve got PII filters.
- You’ve got strict data governance policies.
- You’ve got monitoring systems that log every violation.
When you run an audit, what do you find? A few bullet holes. You see minor policy infractions, maybe a sanitised piece of PII logged. You patch those minor vulnerabilities on the wings and the tail, and you conclude, “Our system is secure. We’ve mitigated the risk.”
This is your survivorship bias.
The real danger: The missing data
Wald’s genius was realising that the data they were looking at (the bullet holes on the returning planes) only told them where a plane could be hit and still survive.
The areas with NO bullet holes (the engines, the cockpit, the fuel tanks) were the areas that, if hit, were catastrophic. Those planes didn’t come back. Their data never entered the study.
In your organisation, those missing planes represent your unmonitored risks:
- The public LLM: Your employee uses a free, public-facing LLM for a quick query, accidentally pasting a document full of customer PII (names, Social Security numbers).
- Shadow AI: A development team spins up a new LLM instance in an unapproved cloud environment, using proprietary source code for fine-tuning.
These are the critical points (the engine and the cockpit) that you are not monitoring. The PII leak isn’t logged in your enterprise system; the catastrophic IP loss happens completely outside your view. Your audit shows a perfect security posture, but only because the real failures are happening on the planes that never return to your hangar.
Optimise for what you don’t see
Survivorship bias makes you optimise for survival, not to prevent death.
To truly secure your data in the age of AI, you must stop focusing exclusively on the visible “bullet holes” in your sanctioned systems. You must dedicate resources to hunting for the critical gaps—the unmonitored services, embedded AI in SaaS, shadow AI and the personal accounts of public LLMs our users bring from home—that are causing the invisible, catastrophic breaches that are currently flying under your radar.
We must become Abraham Walds of AI security and start designing our defenses around what we don’t see.
Find out more about how to secure your organisation’s AI.
*Image shared under the Creative Commons Attribtion-Share Alike 4.0 International license.*