Normally I’d rather dig into a problem and figure out what is going on rather than just reboot… but since June is right around the corner and my last post was in February I’m in need of a quick writing fix.
Now that I have rebooted, I don’t have a lot of evidence to troubleshoot why there was issues. So, only one way to go, forward!
Today’s topic is patching software from the early 90s.
March 14th, 2017, Microsoft came out with this critical Security Bulletin: https://technet.microsoft.com/en-us/library/security/ms17-010.aspx
Ransomware WannaCry uses this vulnerability to take over and spread itself. SMB1 was spawned from an effort to get the DOS local file system to be a network file system. Microsoft’s implementation piled on a bunch of features and became cool right around the time these pants hit center stage.
SMB not a secure server service? Where have I heard this before? MS08-067 was an epic vulnerability that was exploited years after the patch was released. I remembered that bulletin number off the top of my head and I’m sure MS17-010 will get baked in there too. After all, MS17-010 is in metasploit now.
So this many years later, why is this an issue? Mostly backwards compatibility because SMB2&3 are not vulnerable but the server service happily falls back to SMB1 if you let it. We have these absolutely infuriating printers that only support SMB1 so we can’t disable SMB one on the clients. I also blame cloud and devops. New server admins are exposing “things” to the internet for no reason other than convenience and ignorance. While I’m pointing fingers, lets roast Microsoft for creating massive downloads and overly complicated and slow patching systems. Not only are they slow but they are also fragile. Users are not innocent either, they must click all the attachments in their email spawning malware in privileged network locations. Also, users don’t want to agree to lengthy monthly maintenance windows.
Now that we know it is everyone elses fault, what can I do better? In my position, persuasion is a powerfull tool. Automatic maintenance and regular automatic testing are the only ways to scale. More cattle for servers and fewer pets that require special needs. Persuading people that 24/7 uptime is not necessary and more swift automatic patches will take some work. Convincing application developers to handle boot order rather than relying on the servers to boot up in a specific order would help make automation easier and fix a bunch of other potential issues in the process. Creating simpler architectures and not unnecessarily complex microservices. Controlling server sprawl with proper documentation and life-cycle management.
So how did I handle this specific critical vulnerability? It was a team effort for sure. The work really starts as the servers are being built. Requestors either accept a default windows patching window or ask for a custom one. There are automated tests in place that allowed us to proceed with confidence. We used our montioring systems to identify low OS drive space and I check the vCenter database to cross and double check before the package was pushed out. There was a validation script on github that I used to monitor progress https://github.com/kieranwalsh/PowerShell/blob/master/Get-WannaCryPatchState/Get-WannaCryPatchState.ps1 And finally, go through manually and repair the unpatched servers or migrate the workload to newly built servers.
Overall, patchapalooza 2k17 went pretty well. Sometimes I think vulnerability databases have too many false alarms and its hard to pick out what is really serious issue. Forcing out that many reboots can have significant risk to systems, especially if you don’t have the people to handle the increased calls.