Key takeaways from the recent Microsoft Outage and CrowdStrike’s Vulnerability Protection update
Truth, Trust and Pro-active Business Risk Understanding & Action
Summary
-
19 July 2024 saw mass disruptions around the world to public and private businesses
-
Followed by another Microsoft outage on 30 July, lasting 10 hours prohibiting access to 365
-
CrowdStrike did confirmed ‘it was a single content update for Windows hosts’
-
More than 8.5 Million devices around the world affected according to Microsoft
-
A time for reflection and to consider the risk appetite of your business
-
Microsoft Service Agreement Clause 6B it states ‘Microsoft is not liable for any disruption or loss you may suffer as a result’
-
We deliver business-critical solutions across numerous areas
Background
As we know, the impact of large-scale loss of IT infrastructure causes many businesses and critical systems grind to a halt; even the most advanced technology stacks can sometimes come up short. We have just witnessed this in the perfect storm of two unrelated internet infrastructure issues that collided on Friday 19th July 2024. The outcome was mass disruptions around the world to public and private entities alike creating huge customer service confusion, revenue loss, and uncapped reputational damage. A further incident was report on 30 July triggered by a distributed denial of service cyberattack, including Microsoft 365 products such as Office and Outlook and Azure, lasting nearly 10 hours causing Microsoft Windows machines to crash. Companies affected by the new outage include U.K. bank NatWest, according to the BBC.
On Thursday night Microsoft’s Azure US central region experience a widespread outage, which was then followed on Friday Morning by a CrowdStrike configuration file update to the Falcon systems driver that cause the dreaded Blue Screen of Death (BSoD) to more than 8.5 Million devices around the world according to Microsoft own estimates which apparently less than 1% of the global total, thankfully. The fix? Either manually deleting the offending file on every effected device or carrying out full system restorations from backups taken before 4:00am UTC. One can only imagine the scale of time and resource impact to businesses and who is responsible for picking up the tab for this?
Source: Down Detector
CrowdStrike did confirmed ‘it was a single content update for Windows hosts’. They went onto the say that Mac and Linux hosts were not impacted and declared that this was not a security incident or cyberattack and stated a fix had been deployed.
CrowdStrike confirmed it was a single configuration file pushed as an update to Falcon. The update was specifically aimed at changing how Falcon inspects “named pipes” in Windows, a feature that allows software to send data between processes on the same machine or with other computers on the local network. CrowdStrike says the configuration file update was aimed at allowing Falcon to catch a new method that hackers were using for communication between their malware on victim machines and command-and-control servers. “The configuration update triggered a logic error that resulted in an operating system crash,” according to their post.
This does bring into question several concerns for UK businesses that perhaps up until now many leaders may have not realised;
-
The shift to single cloud vendors over multi-vendor on-prem strategies raises systemic failure risks, as reliance on one provider means any outage can disrupt all applications and data hosted on that cloud
-
It is extremely difficult to identify business risks in the interconnecting world we all now live and who has ultimate liability
-
Public Cloud services are not sufficient if you have critical systems (which is defined as above 3 nines of availability) that have to keep working when all else fails, and there is no redress nor accountability or liability on them for any of your business loss and damage
Often in the tech world, systems run in the background with very little questioning as to their purpose and function. However perhaps now is a time for reflection and to consider the risk appetite of your business. As with many businesses, the ‘IT team’ are often responsible for many things from infrastructure to systems, to networks to end devices and everything in between, often underfunded and under resourced and with the rapid rise in cyber-attacks over the last 12 months this serves as a stark reminder of the evolving threat landscape.
You need to understand the risks to your business when you are fully reliant on a single provider for all your IT infrastructure which in this case was Microsoft.
Perhaps many businesses don’t realise that within the Microsoft Service Agreement Clause 6B it states ‘Microsoft is not liable for any disruption or loss you may suffer as a result. In the event of an outage, you many not be able to retrieve your content or data that you have stored’ with their recommendation being ‘We recommend that you regularly backup your content and data that you store on the services store using third part apps and services’.
Key Takeaways: now the dust has settled
While it would be difficult to summarise the true impact to all businesses from this outage and anti-virus update, several key points resonated with our team:
-
Critical Systems: Your critical core applications and data are the life blood of your business. In the same way you accounted for on-premises outages you have to do the same when you transition to the cloud. In reality, when you move to the could your risk profile increases because when global events trigger you are more likely to become collateral damage. You may have to consider repatriation or secondary location of these systems. There are no shortcuts, ensure you can mitigate an outage by your cloud provider.
-
Risk Appetite: Defending against business threats, starts with understanding them first. By recognising the value of your business systems/data and taking proactive steps to enhance your data and cyber security posture. This may mean considering not too be reliant on ‘one supplier’ for all of your critical infrastructure. Our team can assess level of migration needed across all parts of a business where IT plays a part
-
Understanding the impact is one thing, understanding where the accountability and redress is another. Public cloud providers do not accept liability to any loss you incur due to an outage of their platforms. The best you can hope for is system credits in the month the outage happened. That is all.
-
Data Sovereignty Management: Implementing robust data management and governance to ensure role based controlled accessibility with robust procedures serves as the cornerstone for efficiency, connectivity, and user experience and is essential for future-proofing for your business
The Claritas Way
Discover how Claritas Solutions can re-engineer your systems and infrastructure to empower control and ensure UK Data Sovereignty and Systems.
Claritas Solutions gives UK centric organisations like Vodafone, Civica, and the Home Office the foundation needed to solve their most complex data challenges. We build environments that enable a unified, enterprise-wide view of data governance that fosters a holistic understanding of the entire IT ecosystem. We deliver business-critical solutions across numerous areas including:
• Backup as a Service (BaaS)
• Disaster Recovery as a Service (DRaaS)
• Private UK cloud hosting
When you’re ready – Explore how Claritas tackles business challenges head-on
We’d love to hear your sovereignty and data challenges and show you how Claritas can help. For a personalised consultation, click the button below.
Request a meeting