Hello,
Today I am doing a quick post to cover the recent CrowdStrike incident that is estimated to have disabled 8.5M computers and caused more than $5.4B in damages since last week.
Now a common questions is whether CrowdStrike will be liable for damages? The answer is most certainly yes. There is actually a very similar case that was brought to court a few years ago regarding the OVH incident, in France. While it applies to France, which is the jurisdiction I am the most familiar with, the same principles will apply to many other jurisdictions.
One quick note to clear a common misconception before we begin. Most contracts have boilerplate terms to waive liability, there is a common misconception that they may waive liability, however they do not. These terms have no meaning in most jurisdiction outside of the US and either way, it’s not possible to waive liability in most circumstances (e.g. anything involving gross negligence, criminal activities or going against the law itself).
About OVH
OVH is a French datacenter and cloud provider, allegedly the largest hosting provider in Europe. They are most known for providing physical servers and virtual machines, as well as a variety of cloud services.
A fire broke out on 10th March 2021 in their SGB location. It burned down two datacenters SGB1 SGB2 with little or no recovery and rendered two more datacenters SGB3 SGB4 inoperable for a while.
What is interesting is the aftermath. Multiple sites were destroyed causing irrecoverable loss of service and loss of data to their customers. Multiple customers pursued them in court for damages and they won.
I found there were a few interesting points raised and discussed by the court:
- (Skipping the elements about the fire itself, to focus on the service and tech)
- There was complete loss of service during and after the event
- There was complete irrecoverable loss of data after the event
- OVH provided a backup service for their machines and services
- There was complete irrevocable loss of the backups after the event
- There were multiple datacenters in nearby locations, as is standard practice to provide some resiliency: SGB1 SGB2 SGB3 SGB4
- Multiple datacenters burned down at once.
- The multiple datacenters were in fact in the same place, a few steps apart. That was considered unexpected and not reasonable by the court.
- The backups were stored in the same datacenter or in the other datacenter that might happen to be in the same place. That was not considered reasonable by the court.
- OVH tried to argue that customers should have followed good practice of having multiple backups in separate locations. The court acknowledged it was the good practice.
- The court determined OVH was the backup provider, the court determined it was the role of OVH to provide backups to a reasonable standard and observe good practices. This includes storing a copy of the backup elsewhere as is good practice.
- The court ruled the OVH backup service was not operated to a reasonable standard and failed at its purpose.
I find it interesting for techies, the court will judge your tech and what can really be considered best practices. It’s like the ultimate code review 😀
To summarize how things work: harm done + intent to cause harm or negligence = potential for damages
There is significant harm caused to customers, as entire businesses were shutdown, often indefinitely with complete data loss and no possibility to recover. There are multiple occurrences of negligence, mistakes or questionable practices in how OVH was operating the service, which lead to the issue. It’s a solid case. There are multiple customers who opened a case against OVH and won. There may be more still being processed.
That brings us to CrowdStrike. The similarities are striking!
About CrowdStrike
CrowdStrike is an anti-virus software that is installed on computers. It’s sometimes called an EDR (Endpoint Detection and Response) these days. It’s mostly installed on corporate devices in large companies, as they are required to have a security solution.
CrowdStrike runs on startup of the computer. It is deeply embedding itself into the operating system (Windows or Linux) at a kernel level, to run as soon as possible and before other things start. It monitors what runs, it can block and report anything that it deems suspicious.
On 19th July 2024, CrowdStrike pushed an update to their software. The update was bugged and crashed any computer it was deployed on. Millions of computers simultaneously received the update across the world and were rendered non functional.
I think there are multiple interesting points to raise and discuss:
- CrowdStrike runs at startup in a highly privileged mode (kernel driver on Windows) and it starts first.
- It can prevent any other software and prevent the system itself from running, whether intentionally to block a threat or accidentally due to a bug or misjudging a non-threat.
- It is deployed to millions of corporate devices in industries like banks, travel, supermarket, etc… it is largely targeted to critical industries and critical devices with confidential information.
- It is a highly critical application operating in sensitive environments, which requires extra care to develop and to test.
- On the day of the incident, CrowdStrike pushed the updates to millions of critical devices at once
- Good practice requires to stage software upgrades.
- How was it possible for CrowdStrike to ship a (broken) update to millions of devices in the span of minutes? Was there no testing and no staged rollout?
- From discussion online, customers in hospitals have complained about this issue before and requested for CrowdStrike to allow some control on updates. One customers reported they were rejected with a 50 pages memo from CrowdStrike saying they refuse to stage anything.
- Does CrowdStrike not have any ability to stage a rollout? They have repeatedly alluded that they did not and/or refused to. That may be in breach of regulated industries they sell to.
- CrowdStrike is expected to be heavily tested to not disrupt the (critical) devices it is deployed to.
- The update crashed any computer it was deployed to (BSOD).
- How was it possible for an ostensibly broken update to not be detected before it was pushed to the outside world?
- Does CrowdStrike do any testing whatsoever? Obviously they didn’t or the incident wouldn’t have happened.
- It is not an isolated incident. The same thing happened few weeks earlier with the CrowdStrike agent on Linux, nuking the system and there may be other occurrences before.
- After the bad update was pushed, it took nearly two hours for CrowdStrike to realize there was a problem and stop the update.
- Developers working on critical software are required to monitor a deployment after deploying, to verify it’s working as expected and not causing issues.
- What was CrowdStrike doing after deploying? Were they monitoring the deployment? Could they not notice that the update destroyed every machine it was deployed to?
- All computers were rendered inoperable by CrowdStrike, unable to boot.
- For affected companies, that left all their employees with a dead computer, unable to do anything.
- It wasn’t possible for users to “access” the computer to raise a ticket or troubleshoot it.
- It was a complete loss of service with no way to recover.
- One way to fix the computer was for the IT team to be given the computer and completely reinstall (reimage) it.
- Another way that was found later in the day, was for an administrator to access the computer physically AND try to boot in safe mode or recovery mode then delete the driver file for CrowdStrike.
- This remediation can only be done with physical access to the affected computer AND by an administrator who has a special password (or USB key with the password) to start a laptop into recovery mode.
- It will take weeks for affected companies to physically get a hand on every device, user laptop, desktop and server. It can be thousands to hundreds of thousands of devices to get to.
- It will take longer for devices that are enclosed or difficult to access, like screen terminals in an airport, medical devices and machinery in a hospital, elevator panels.
- It may be impossible to restore the device if the device is locked down somehow (physically blocked or recovery password unknown).
- Employees who require a computer to work are unable to work during all that time.
- It is not possible to provide a spare computer to affected users, the spares were affected by the issue too.
- Crowstrike is a security software that was meant to keep computers running and protected from threats.
- CrowdStrike destroyed the computers it was supposed to protect, it failed at its purpose.
There is significant harm caused to customers. Businesses were partially or completely shutdown, for days or weeks. There are multiple occurrences of negligence, mistakes and questionable practices in how CrowdStrike was operating the service, which lead to the issue. The issue was not an isolated incident, as people have reported the same thing happened just few weeks before on a lesser scale.
That should leave CrowdStrike liable wide open to countless claims for damages.
Customers operating in regulated industries like healthcare, finance, aerospace, transportation, are actually required to test and stage and track changes. CrowdStrike claims to have a dozen certifications and standards which require them to follow particular development practices and carry out various level of testing, but they clearly did not. The simple fact that CrowdStrike does not do any of that and actively refuses to, puts them in breach of compliance, which puts customers themselves in breach of compliance by using CrowdStrike. All together, there may be sufficient grounds to unilaterally terminate any CrowdStrike contracts for any customer who wishes to.
Appendix
As additional evidence, we can quote an employee working for BitLocker discussing their testing and rolling methodology. BitLocker is a tool to securely encrypt the disk on your computer. It will prevent anybody else from reading any data if the computer is lost or stolen. It’s similarly critical. It’s the first thing to run on startup and nothing can run without it (no data can be loaded from disk). Any mistake or bug in BitLocker would render the computer unable to operate and render all data on it unreadable (full irrecoverable data loss, exactly like the OVH incident). They have many layers of testing, starting with testing on their own computer then their own team then other teams. Do CrowdStrike employees even have CrowdStrike running on their machine?

As additional evidence, we can quote another employee working for a non identified company, who claims their company was hit by a previous CrowdStrike issue, they formally requested for CrowdStrike to allow staged rollout, CrowdStrike refused and sent back a 50 pages memo categorically refusing the idea. If that is true, this memo may now constitute critical evidence against CrowdStrike.
