Documentation is a bit of a chore. But like the dishes, things feel a bit better after its done. Also like dishes, the job is never really done.
My team is asked quite frequently to document all kinds of procedures. As a server admin, there are a few key questions that need to be documented.
Maintenance Window – When is it acceptable for the server to be down?
Shutdown/Startup procedure – If not automatic, what steps need to be taken so the application comes online? How do we test the application is working?
Dependencies (part of startup) – If this server is down what applications are affected? Also, what other things need to be operational before this server is working?
Instead of supporting a specific server, we work on teams and view support from an application perspective. This is good but it adds the need to document what servers go to what applications. Alerts, such as SCOM, don’t usually include the application name unless you have a good naming convention.
There is usually a main or go-to leader for an application and the rest of the team is backup. Digging deeper into the key documentation we should find more information specific to the application.
Credentials – Admin and service accounts should be documented. There are plenty of options for encryption software or password wallets. Include links to the softwares administration pages or programs. Also, what authentication method is used for general users.
How to contact support – phone number, website, credentials etc.
Restore steps – where are the backups, restoring onsite, restoring offsite, rebuilding from scratch
Visuals – Diagrams that show what servers users hit and all the pieces in the dependency chain that could break.
How to contact users – consider setting up a mailing list so this can be automatically updated
Server general info – SN#, OS version, app versions, drive space, normal CPU/mem/disk/network usage. You should be able to automate most of this.
There are probably some things I missed and some things that may not be needed. Now that we’ve got all the data we need to work on normalizing it. By normalizing documentation I mean two things. #1 removing redundant documentation and #2 create a template so all documentation is similar.
#1 is like database normalization. Create your metaphorical lookup tables so server admins don’t have to document anything twice. Don’t create a giant spreadsheet because then you will be repeating yourself over and over again. Also, try to pick a medium that is easy to combine things that are already documented. This will help removing the redundant and hidden documentation. Spreadsheets are also a poor idea because they don’t include screenshots and other visuals that well. Choose something like sharepoint or one note. Also, make sure the medium you pick is available offsite. Automating into and out of this documentation might be useful.
#2 Create a template and cleanse the already existing data. Take for instance the maintenance window. We need a general format, my idea is to pick 5 to 6 different maintenance windows and try to fit all of the servers into that. Try to answer the question, “what is the best time to perform maintenace” instead of the question, “when is it possible to perform maintenance.” Honestly, most applications if you just clear it with the user you can perform maintenance anytime. Examples of maintenance windows would be, “non-business hours”, “Sunday or holidays”, “2-4am”, “business hours”, “during mainframe maintenance once a month” and “other”. Try to avoid “other” to conform to the template, but also don’t be naive. There are always valid outliers to any rules.
Like any large project focus your energy on the important parts, the parts that are severely lacking. First pick a medium and then practice your template on a couple of applications so you can fine tune it before you tackle the who list of servers.