RSS

Monthly Archives: February 2013

SQL Cluster dependency oops

With a 2008R2 -> 2012 upgrade and a massive storage migration under my belt I was feeling pretty good about my competency. All the data was moved off the old drives but they were still online. To get these drives offline a few steps were needed.

I knew the SQL Service was still dependent on these drives and going under the cluster covers into disk management to turn them off would be a bad idea. I carefully removed them from each instance service in the cluster manager. This removes the dependencies and plops the the cluster disks into unused storage. Then I got into cluster storage and deleted these old drives.

When drives are added, you have to add a dependency in the SQL Service to it. Otherwise, when failing over, drives will get yanked out from underneath SQL before it’s done gracefully shutting down. So in my infinite wisdom, I reworked all of the dependencies.

It went something like this:
bring the IP online
bring the name online
bring a mount point drive online
bring all the other drives online
bring SQL online
bring the agent and AS online

So two weeks go by and I have to do a failover now because the old SAN had to be removed from MPIO. This unfortunately requires a restart. So at 5am I logged in to click a couple buttons and hopefully go back to bed. No suck luck. The failover looked so graceful taking all the services offline in the correct order, then bringing everything online right up until it hit the SQL Server. It tried for about 1 second and failed.

If you haven’t noticed, cluster events are useless in scenarios like this. No detail, I assume this is mainly for reporting epic problems. I got into the event log, nothing there except the useless “SQL Failed” message in the system log. I went right to the SQL Logs but noticed the last time it was updated was went the agent went offline. No logs? That is bizarre.

I tried several other ways to start the SQL Service. It wouldn’t start on ANY node which didn’t make sense because the only change we made was to unmap a single node full of old SAN drives. After about an hour I decided to make my way into work and alert the authorities. I was assuming I would need some help from Microsoft at this point but I don’t have direct access to open tickets.

Fortunately, I double checked the logs. In SQL Configuration Manager I noticed there is a Dump location. I found some .xel files in this location and opened the newest up in SSMS or VS shell.

virtual_server_name_error

“Failed to check and change virtual server name. Please contact customer support.”

This may not sound too clear to you right off the bat, but it was clear to me. I remembered there being a dependency in the SQL Service to the virtual name before. But this time I had attempted to chain these dependencies. I figured that if SQL was dependent on the drive, and the drive dependent on the name, everything would come up in the right order. It appeared so, but apparently SQL really needs that dependency on the name. I added it and SQL came online shortly before 8am. Moral of the story, test any cluster dependency changes (duh).

 
Leave a comment

Posted by on February 28, 2013 in SQL Admin

 

Moving SQL Databases

This post is an accumulation of a few major and lots of minor things I forgot, or almost forgot when moving databases. It’s easy, but since there is a risk of data loss it’s not something that should be taken lightly.

Let’s start with moving a database to fix file fragmentation. If you do this, you might be largely underwhelmed at the benefits. If a database goes through a series of growths and has competing files on the same drive growing, external fragmentation can occur. This is usually a mythical problem that is used as subterfuge for masking the real code or table design problem. Now-a-days, with SSDs, massive amounts of RAM, SANs with huge caches and tiering mechanisms that move your data around fragmenting it anyways, it’s just not the problem is used to be. But if it helps you sleep at night, by all means backup and restore your database to a drive with some contiguous free disk space. There is quite a bit of risk for potentially zero benefit to move for this purpose of external file fragmentation. This problem will probably be better fixed the next time you really have to move the database.

Moving databases for a storage upgrade is main reason for this post. I recently finished moving a couple of fat clustered instances. In my case this was from an oldish SAN to a shiny new SAN. There are SAN tools that can copy luns online and SQL would have no idea that this process took place. If you decide to dump this task on your SAN admin I would take a backup and shut the instance down for safety reasons. These tools can copy the data faster when the server is offline. If you do decide to try online moving, make sure to run a checkdb after it’s done especially if your SAN admin says any of the words blip, hiccup, glitch or my personal favorite flap.

We’ll assume that the SAN admin said one of those words and you’ve decided to take matters into your own hands. For a storage upgrade you will want a good before/after baseline. Look at things like file stalls and disk latency before even touching anything. Then, test out a few methods of moving the data. We need to do some testing so we can get a feel for how long the process is going to take. Once that is done, check with the users to see what kind of outage they are willing to take. You might get something like this:

dba: How long can this thing be offline?
users: no.

In that case you’ve got one remaining option. Setup database mirroring on a new instance and failover to the new drives. In the event that you get something like this:

dba: How long can this thing be offline?
users: WE LOVE YOU, DO WHATEVER YOU NEED 🙂 XOXOX

Hah, well then we’ve got a few options. If you have mostly empty data files that 20% used or giant empty log files, consider taking the backup and restore method. Backup file sizes resemble the used space in a database and even less than that if you apply compression. If you choose backup and restore, consider turning the database to read-only before you start the backup. This will kick users out but they will be able to get back in while the backup is happening, and while they are waiting for you to do the restore. The database will restore in read-only mode so don’t forget to flip it back to read-write when ready. You could also go with single_user if you really didn’t want anyone else touching the database.

Another method that requires and outage is the offline/online method. Basically you submit an alter database, modify file command which will tell SQL the next time this database starts up look in this new location. To do this you have to take the database offline and while it is offline move the files to the new location. This can be scripted but not in T-SQL alone like a backup restore unless you have sp_cmdshell enabled (bad).

Similar to offline/online is detach/attach. One benefit of detach/attach is that you can repair a logfile has not been treated with TLC. Look into attach syntax with rebuild log. This is relatively risky so I would recommend taking a backup.

To pick the fastest of these methods we have to compare the time it takes to run a backup AND a restore vs. the time it takes to copy a lot of potentially empty shell of .mdf & .ldf files. In most cases I’ve come across a compressed backup and restore is the best option. Things you can forget are:

1. a file
2. lots of files
3. a database
4. lots of databases
5. taking a full backup after it’s done so the log and differential backups will work

There are not a lot of complexities when moving a database for storage reasons. There are however, a lot more things to forget when moving instances.

SQL Server databases are relatively portable. If you have avoided jumping into the use of every bell and whistle that SQL has to offer there is not a lot to miss. The same methods to move data to new storage can be used to move data to new servers or instances.

Things that may have to come with the database
– agent jobs
– linked servers
– server logins- sql logins can be exported and re-mapped
– backups ( in case you have a retention policy )
– Services outside of the database engine like AS, IS, RS
– credentials
– maybe other databases that work together
– full text catalogs (or just rebuild them)

Other things to consider
– new service account permissions
– security certificates for things like TDE
– replication
– any application or developer that might have a connection string to this server (dns poitners can help)

And that’s a wrap. Like I mentioned before, backup->restore worked best in my situation. Mostly because I had a sizable maintenance window so I could script it all out.

 
Leave a comment

Posted by on February 16, 2013 in SQL Admin