Having the RPO/RTO discussion

13 Oct

I’ve only done this a couple times. I want to get better at it. I have an opportunity to have this discussion in the near future and this is my plan of attack. The first misstep is to become combative and defensive so the use of “attack” is sarcasm 🙂

So here is a general back and forth discussion that can cause loops and be very time consuming.

Users Q: How much is this going to cost?
IT Q: What is your RPO/RTO?
Users Q: What is RPO/RTO?
IT A: RPO/RTO is how much data can you loose and how fast do you want to be back up.
Users Q: How much is this going to cost?
IT Q: What is your RPO/RTO?
Users: What is RPO/RTO?

Its great to have a snowballing email chain and be able to see everyone chime in on the discussion. This kind of thing happens and here is how I plan to avoid it in the future.

So to exit this loop, you should provide a menu of basic options. Something like good/better/best. However, that doesn’t seem to work out to well because its difficult to put a price tag on a shared environment. If you tell them how expensive your storage is they may send you a 1TB drive they picked up at Best Buy.

Newer high end SANs have the ability to “tier” storage so the important applications have the right of way. I don’t condone this as a good practice because I have not yet seen it work… yet. What is happening is more code is being injected into the SAN to over compensate for architectural problems. More code means more code updates, and that can cause major problems you could have avoided in the first place.

Back to the discussion, the meeting starts and everything is fine, people are happy to chat but eventually work has to start.

IT Q. How much dataloss is acceptable?
Users A. none

So the frustrations start. A flurry of emotions are sparked when this response is given because we know all the time, effort and money that will get tossed at this for potentially, no valid reason.

Its easy to avoid this discussion and just do nightly backups. If you do nightly backups in simple mode for SQL, you can guarantee 24HR RPO. But mostly, backups happen at night. 99.99% of the time, IT isn’t going to log in and run a backup, they will take care of it in the morning… or next week. So as long as the failure doesn’t allow users to still enter data into the system you can have a 24HR RPO but if it does we’re looking at more like 36HR RPO.

What I suggest is not leading with the RPO/RTO definitions. This conversation should start with a little about the application.

1. How many users
2. How many concurrent users
3. What kind of data
4. How is the data accessed
5. What kind of data is outside of the database (for app consistent restores)
6. What are the maintenance windows
7. What will the growth rate be for data & users
8. What is the roll out schedule (can we start small and grow?)

Getting to know the application implies the feeling that we, IT, care about the application. This is a good thing, we want to care and really do. With this information we can start to compare/contrast it to some other more important applications and some less important applications. Maybe at this point it is safe to let the users know what the other application’s restore objectives are. The users will recognize where their application sits in the food chain and probably be totally fine with it. They don’t use just one app all day long so they will be happy to know you have taken the time to properly organize your priorities.

What the end game for an internal RPO/RTO discussion should be is communication. Communicate the challenges IT faces in writing. It doesn’t have to be some kind of contract. Just like in any other project that ran a year past due, restore objectives are really just goals. You cannot accurately predict future disasters. You should however try to predict the future because you may be a lot closer than you think.

Leave a comment

Posted by on October 13, 2011 in Network Admin, SQL Admin


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: