20-07-2024, 11:18 AM
(20-07-2024, 11:01 AM)ppppenguin Wrote: Is there anything special about Windows such that it lacks the ability to recover with some grace from what was a very deep level foul-up? Are Unix-like kernels (in other words Linux, Mac, Android) more robust? Presumably if you prod even the most robust OS in the wrong place it can become FUBAR.
The recovery is actually quite simple - mostly it's just boot into safe mode + networking (though the "networking" bit isn't always necessary), then delete the stuffed file, then reboot.
Problem is that this generally requires someone physically at the machine, and there are many 1,000s affected. Serious servers have a remote console - effectively a remotely accessible console that is embedded in the server hardware and is quite separate from the server s/w. You can use this remote virtual console to manage the hosts just as if your were physically there with a serial console cable and a laptop. HP calls this capability "iLO" (Interactive Lights-out), Dell call it "iDRAC" (Integrated Dell Remote Access Controller), IBM call it "HMC" (Hardware Management Console) etc. It's generally a cost option and not cheap as you have to buy the capability for the hosts plus have a separate "management" network infrastructure to access them, so there's a costs in hardware, licensing and system management overhead, though these costs are mitigated by being able to deal with a large percentage of issues without having to physically go to the host. My teams used to manage 100s of servers this way, many in secure datacentres on the other side of the planet. Without this capability, if we had an issue with a machine in NY, someone would have had to get on a 'plane, get a taxi to Secaucus, get access to the secure underground datacenter there that we used and plug their laptop in. Alternatively you can use "hot hands" which is a local technician in the datacenter who will do what you tell them for a HUGE cost!
There is no defined Unix kernel - Unix can use several kernels, including micro-kernels which are generally safer than monolithic ones as all the drivers etc. are kept apart so one crashing doesn't necessarily impact the system as a whole - unfortunately micro-kernel designs are generally significantly slower than monolithic ones due to the extra overhead of more complex inter-process signalling and context switching, hence are rarely used in "heavy lifting" systems.
The issue here was that in order to catch boot-time hackery, the CrowdStrike Falcon system loads very early in the boot cycle, before any networking etc., and then crashes the system before it can automatically download the fixed patch, i.e. it's in a death loop that has to be broken manually by physical presence at the host or, if available, remotely by a virtual console.
sıʌǝɹq ɐʇıʌ `ɐƃuol sɹɐ
ʞɔıu
ʞɔıu







