news

"Microsoft Blue Screen" caused global shock, and the security risks of technology giants aroused concerns

2024-08-06

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Source: People's Post and Telecommunications News


On July 19th local time, some applications and services of Microsoft Corporation in the United States were unavailable, experiencing access delays, incomplete functions or inaccessibility. A large number of user computers experienced blue screen phenomena, resulting in the suspension of flights in many countries and regions, and the "shutdown" of industries such as medical care, banking, and hotels. The industry described this incident as "the largest IT outage in history."

Microsoft officially confirmed that the failure was caused by the Microsoft cloud service Azure regional data center in the central United States. The reason was that the security software "Falcon" of its network security service provider CrowdStrike had a serious error during the update process, which caused an incompatibility reaction with the Microsoft operating system and triggered the self-protection mechanism of the Windows system. In addition, CrowdStrike was updating its software simultaneously around the world, causing large-scale computer blue screens around the world, which not only affected ordinary users, but also caused many cloud services using Windows Server to crash, further expanding the scope of impact.

The incident affected more than 20 countries including the United States, the United Kingdom, Australia, Germany, and France. According to Microsoft's estimates, the accident affected nearly 8.5 million user terminals worldwide. Critical infrastructure and enterprises in various countries have been affected, resulting in large-scale service interruptions. As the "Microsoft Blue Screen" incident continues to expand, thousands of flights have been canceled, some financial transactions have been interrupted, medical services in many cities have been delayed, and production lines of large multinational companies such as Tesla have been shut down. In addition, some airports, stations, docks, and media, telecommunications, banking and other industries in the Netherlands, Spain, Poland, Belgium, the Czech Republic, Japan, Singapore, Australia, New Zealand, Israel, South Africa and other countries have reported "technical problems" and operations have been affected.

According to data provided by Qi'anxin, a domestic network security company, the main customers of "Zhongji" software in China are branches and joint ventures set up by foreign companies in Beijing, Shanghai, Guangzhou and Shenzhen, about 100 of which have installed the software in 10,000 machines. The main affected are Beijing Universal Studios and Shanghai Disneyland, where tourists cannot check out, and Hong Kong Airport passengers cannot self-board. Because the Chinese government and enterprises and institutions have carried out independent control of operating systems, major government agencies and state-owned enterprises have been almost unaffected by this accident. Overall, because a high proportion of Chinese computers are installed with domestic security software, they are less dependent on "Zhongji" companies and are limitedly affected by this incident.

Microsoft's failure in supply chain management of "Zhongji" was the main cause of the failure. The faulty "Zhongji" software runs at the kernel level in the computer operating system, which is a higher level than ordinary applications and is a critical application. For such critical applications, it is necessary to ensure that they are always safe and compatible with the operating system. Updates generally require the highest authorization of the operating system and must go through the necessary processes of internal testing - compatibility testing - security testing - performance testing - user acceptance testing - gray release - formal release - monitoring and support. In the incident, the information reporting and handling mechanism between "Zhongji" and Microsoft was obviously ineffective. First, it is possible that "Zhongji" made the update without Microsoft's operating system update authorization; second, it is possible that Microsoft did not take further action after "Zhongji" reported it to Microsoft.

Insufficient redundancy design in some organizations makes post-disaster recovery difficult. In the security risk investigation of various organizations, it is common to back up the data stored in the computer system, but most organizations do not require the backup of the operating system. In the incident, the operating systems of some organizations lacked a backup mechanism, resulting in the need to restore their business processes from scratch after the failure. The solution provided by "Crowd Attack" has increased the difficulty of business recovery. The repair program of "Crowd Attack" requires people with administrative privileges to manually operate one computer after another. Some organizations rely on remote work, and recovery is even more laborious. Therefore, three days after the incident, many IT systems around the world are still paralyzed.

The ecological monopoly of technology giants has caused security concerns. According to data from StatCounter, a well-known American website traffic statistics service provider, as of the end of December 2023, Microsoft's Windows operating system market share was 72.72%. At the same time, 271 of the world's top 500 companies are "Crowd-strike" customers. In order to further consolidate its monopoly position, the technology oligarchs have tied suppliers closely to their own standards and ecology through the supply chain system. Once users enter their ecology, on the one hand, they are accustomed to the operation of the ecosystem and become highly dependent on it. On the other hand, switching to other systems will face high costs and complex technical obstacles. In the long run, it will lead to carelessness. Key industries and infrastructure have not formulated emergency backup plans. When security problems arise, they directly threaten the safe operation of the entire technology ecosystem.

(Author’s unit: China Unicom Research Institute)