RSS

Category Archives: Storage

MTC Chicago Visit

Microsoft has datacenters that are open to customers for training and consulting of all kinds. I’ve been to the newer one in Detroit and to the MTC Chicago twice now with the most recent time being this week. It was a good visit sponsored and encouraged by our storage vendor.

The key topic being ways to manage copies of data. We have some specific needs surrounding reporting, agile development, and quality assurance that request several copies of large databases. Thin provisioning, compression, and deduplication help control costs but several arrays do best when the copy is a snapshot done at the block level. This reduces the cost of having that copy and keeping it around. These users however, generally want to create the copy at the database level. This has a few complexities, even in a fairly standard virtual environment.

Some of these copies require the source system to be quiesced so the snapshot technology needs to integrate with your database technology to cause a pause in writes while the snapshot completes. SQL handles crash consistency very well, but if some require quiescing, I may as well setup a system that does that for all of them. The virtual layer has abstraction layers between what the guest operating system sees and what the storage system presents. This also needs to be coordinated. All of these separate scripts that use different APIs need to be scheduled and some needs to be able to be called from another application or run on demand.

This problem has been mostly solved by some of the automation I have done surrounding backup and restores. We have a working solution but it takes technical time and more storage than it needs. Whenever you have a solution that is good enough, it prevents you from creating a great solution. It is a good solution up to a point until the number of copies, and time it takes to make a copy become prohibitive.

It was a great trip to see another datacenter in action that has the up and coming technologies. It gave me motivation to work on the great solution instead of settling for one that barely works.

 
Leave a comment

Posted by on July 29, 2016 in SQL Admin, Storage

 

EMCWorld 2016 Recap

2016-05-20_20-34-08

This was my second time to EMC World and I enjoyed 2016 just as much at 2014. I ended up signing up for a test and am happy to report that I passed! Most all big conferences like this offer a free or reduce attempt at one of their exams and I chose the XtremIO Specialist for Storage Administrators. I prefer taking the exam first thing Monday. Sure there is a chance I could learn something during the week that might be on the exam but I think it is more valuable to be well rested and not have my mind cluttered with all the new knowledge. Once that was done I had time to get into the keynote.

Opening Keynote
2016-05-20_20-35-04

Seeing Joe Tucci on stage for possibly the last time was a bit like seeing John Chambers at Cisco Live the previous year. Although difference circumstances both crowds seem to respond the same way to seeing their respective leaders pass the torch. Micheal Dell took the stage and had a few interesting things to say

-Dell Technologies will be the new company name
-Dell EMC will be the enterprise name
-EMC is the best incubator of new technology
-Dell has the best global supply chain
-Both companies combine for 21 products in the Gartner Magic Quadrant

There were also some product announcements at EMC World. Unity, which is a mid-tier array that there is an all flash version for $20K. DSSD D5, no pricing here because if you have to ask, it is too expensive. This product addresses some of the IO stack issues and works with new “block drivers” and “Direct Memory APIs” to reduce latency [1]. If 10 million IOPS isn’t enough, cluster ability is coming soon. ScaleIO, Virtustream Storage Cloud, enterprise copy data management (eCDM) and the Virtual Edition of Data Domain were also announced.

RecoverPoint

When setting up my schedule I made sure to get all the interesting looking RecoverPoint sessions booked. Gen6 hardware is out so it is a product that has been around for a while… or has it? EMC didn’t make it easy for us when choosing a product name for RecoverPoint for VM (RPVM). RPA or RecoverPoint Appliance is separate from RPVM. RPVM uses a IO splitter within ESXi in order to provide a potential replacement for VMware’s Site Recovery Manager. I took the hands on lab for RPVM and found it to be pretty complex. It is nice to be able to pick and choose which VMs I can protect but sometimes I want to choose larger groups to reduce the maintenance. Maybe this is possible but it wasn’t very clear to me. My suspicion is array based replication will still be more efficient than host based replication options such as RPVM or vSphere Replication.

RPA has a very interesting development along the lines of DR. Since hearing about the XtremIO, I questioned how the writes would be able to replicate fast enough to achieve a decent RPO. RPA can now utilize the XtremIO snapshots in a continuous manner, diff them, and send only the unique blocks over the WAN. That makes things very efficient compared to other methods. Also, the target array will have the volumes that we can make accessible for testing using XtremIO virtual copies (more snapshots).

DataDomain, DDBoost and ProtectPoint

DataDomain’s virtual appliance announcement was interesting, but I’m not sure I have a specific use case yet. Mainly the need to backup a branch office might come into play but I would want a separate array to host that vmdk. ProtectPoint has volume level recovery features and SQL Server integration now. I can choose to backup a database who’s log and data files are on the same volume and then use the SSMS plugin to do a volume level restore. This grabs the bits from DataDomain and overlays them to the XtremIO using your storage network. I’m not sure how efficient this restore is since I just did the hands on lab but it is very appealing for our very large databases that tend to choke the IP stack when backing up.

DDBoost v3 is coming out in June. This release includes things like copy only backups, restore with verify only, AAG support, restore with recovery for log tails, and also restore compression. I know many DBAs have had a bad experience with DDBoost so far. I have avoided it but v3 might be worth a try.

Integrated Copy Data Management and AppSync

If you have two volumes on XtremIO and load them up with identical data one right after another, you will not see a perfect reduction of data. The inline deduplication rate (ballpark 2:1 – 4:1) will kick in and save you some space but not a lot. If you can implement a solution where you can present the volume of data that is pre-loaded to another host, XVC (writable copies) will save a ton of space. In one session they surveyed several large companies and they had roughly 8-12 copies of there main production database. Consider that being a 1TB database with 2:1 data reduction. That is .5TB used physical capacity plus the change rate between refreshes. Now in a traditional array like VMAX (no compression yet), that is up to 13TB used.

I think one of the goals of the AppSync software is to put the CDM tasks into the hands of the application owner. The storage administrator can setup the runbooks and then create and grant access to a button to do all the necessary steps for a refresh. It sounds like support for Windows clustered drives is in the works with other features being added soon as well.

Deep Dive Sessions

I attended a session titled DR with NSX and SRM. The speakers proposed a virtualized network solution that decouples DR from the physical network. No more L2 extension technology required. The cross vCenter NSX used Locale ID tags for each site to create local routes. The architecture even had some solutions for public website natting to the proper location. I hope the slides get posted because it was pretty deep for me to take in lecture form. The one thing I found fairly comical was the speaker mentioning OTV being expensive as a reason to look at NSX… maybe they have never seen a price for NSX.

The VMware validated reference design was a very good session for me. It validated a lot of our decisions and also got me thinking about a couple new tweaks. HW v11 can now scale to 128 cores and 4TB of RAM for a single VM. VMs are getting 99.3% efficient verses their hardware counterparts. Some hadoop architectures even perform better in a virtual environment. My notes look more like a checklist from this session:

-vSphere 6 re-wrote storage stack (I think for filter integration not necessarily perf)
-check vCenter server JVM sizing
-rightsize VMs
-size VM into pNUMA if possible
-don’t use vCPU hot-add (memory hot add is fine)
-hyperthreading is good
-trust guest & app memory suggestions more than esx counters
-use multiple scsi adapters
-pvscsi more efficient for sharing
-use Recieve Side Scaling in the Guest
-use large memory pages
-look at performance KPIs to determine if settings are beneficial (not just cpu%/ etc..)
-balooning is an early warning flag for paging
-go for a high number of sockets (wide) when in doubt over vsockets or vcores
-watch out for swapping during monthly windows patches
-co-stop is a sign the VM is hurting by having to many CPUs
-clock frequency is latency driver, especially for single threaded ops

In another session I attended was Deployment best practices for consolidating SQL Server & iCDM. There is a webinar the last Wednesday in June for scripting examples and demos.

There were some really good storage performance tips in this session:

-vSCSI adapter – disk queue depth adapter queue depth
-vmkernal admittance(disk.schednumreqoutstanding)
-physical hba – per path queue depth
-zoning multipathing
-shared datastore = shared queue -> disk.schednumreqoutstanding override lun queue depth
-separate tempdb for snapshot size purposes, tempdb is a lot of noise and change that isn’t needed
-don’t need to split data & logs anymore
-still create multiple data files

Conclusion

The EMC brands are evolving at a pace necessary to keep up with the next wave of enterprise requirements. I was happy to be a part of the conference and hope for a smooth acquisition.

2016-05-08_18-30-35

 
Leave a comment

Posted by on May 20, 2016 in Storage

 

Disaster Recovery

I have recently been sucked into all that is Disaster Recovery or Business Continuity Planning. Previously I have been a bit dodgy of the topic. I haven’t really enjoyed the subject because it always seems to distract from my focus on backups and local recovery. I liked to focus on the more likely failure scenarios and make sure those are covered before we get distracted. I’m not really sure if that was a good plan or not.

We would have to loose almost our entire datacenter to trigger our disaster recovery plan. A fire in the datacenter, tornado or maybe loosing our key storage array might trigger DR. Dropping a table in a business application isn’t something you want to trigger a DR plan. Developing a highly available, resilient system is a separate task from developing a DR plan for that system. It was very challenging to convince people to complete a discussion of the local recovery problems without falling into the endless pit of DR.

There seems to be two different business reasons for DR. 1. Complete a test of the plan so we can pass an audit once a year and 2. Create a plan so we can actually recover if there were a disaster. The first one comes with a few key caveats, the test must be non-disruptive to business, it cannot change the data we have copied offsite and it cannot disrupt the replication of the data offsite.

In a cool or warm DR site, the hardware is powered on and ready but it is not actively running any applications. If I were to approach this problem from scratch, I would seriously consider a hot active site. I hear metro clusters are becoming more common. Sites that are close enough for synchronous storage replication enable a quick failover with no data loss. A hot site like this would have many benefits including:
1. Better utilization of hardware
2. Easier Disaster Recovery testing
3. Planned failovers for disaster avoidance or core infrastructure maintenance

However, there are downsides…
1. Increased complexity
2. Increased storage latency and cost
3. Increased risk of disaster affecting both sites because they are closer

Testing is vital. In our current configuration, in order to do a test we have to take snapshots at the cold site and bring those online in an isolated network. This test brings online the systems deemed critical to business an nothing more. In an active/active datacenter configuration, the test could be much more thorough where you actually run production systems at the second site.

A most basic understanding of DR covers the simple fact that we now need hardware in a second location. There is much more to DR than a second set of servers. I hope to learn more about the process in the future.

 
Leave a comment

Posted by on February 7, 2015 in Hardware, Storage, Virtual

 

5 9s Lead to Nestfrastructure (and fewer 9s)

Off the top of my head,

Microsoft DNS issue a handful of hours before xbox one launch(http://redmondmag.com/articles/2013/11/21/windows-azure-outages.aspx)

Widespread Amazon outages (http://www.zdnet.com/amazon-web-services-suffers-outage-takes-down-vine-instagram-flipboard-with-it-7000019842/)

NASDAQ (http://www.bloomberg.com/news/2013-08-26/nasdaq-three-hour-halt-highlights-vulnerability-in-market.html)

The POTUS’s baby (http://www.healthcare.gov)

I learned about 5 9’s in a college business class. If a manufacturer wants to be respected as building quality products, they should be able to build 99.999% of them accurately. That concept has translated to IT as some kind of reasonable expectation of uptime. (http://en.wikipedia.org/wiki/High_availability)

I take great pride in my ability to keep servers running. Not only avoiding unplanned downtime, but developing a highly available system so it requires little to no planned downtime. These HA features add additional complexity and can sometimes backfire. Simplicity and more planned downtime is often times the best choice. If 99,999% uptime is the goal, there is no room for flexibility, agility, budgets or sanity. To me, 5 9s is not a reasonable expectation of uptime even if you only count unplanned downtime. I will strive for this perfection, however, I will not stand idly by while this expectation is demanded.

Jaron Lanier, the author and inventor of the concept of virtual reality, warned that digital infrastructure was moving beyond human control. He said: “When you try to achieve great scale with automation and the automation exceeds the boundaries of human oversight, there is going to be failure … It is infuriating because it is driven by unreasonable greed.”
Source: http://www.theguardian.com/technology/2013/aug/23/nasdaq-crash-data

IMHO the problem stems from dishonest salespeople. False hopes are injected into organizations’ leaders. These salespeople are often times internal to the organization. An example is an inexperienced engineer that hasn’t been around for long enough to measure his or her own uptime for a year. They haven’t realized the benefit of keeping track of outages objectively and buy into new technologies that don’t always pan out. That hope bubbles up to upper management and then propagates down to the real engineers in the form of an SLA that no real engineer would actually be able to achieve.

About two weeks later, the priority shifts to the new code release and not uptime. Even though releasing untested code puts availability as risk, the code changes must be released. These ever changing goals are prone to failure.

So where is 5 9s appropriate? With the influx of cloud services, the term infrastructure is being too broadly used. IIS is not infrastructure, it is part of your platform. Power and cooling are infrastructure and those should live by the 5 9s rule. A local network would be a stretch to apply 5 9s to. Storage arrays and storage networks are less of a stretch because the amount of change is limited.

Even when redundancies exist, platform failures are disruptive. A database mirroring failover (connections closed), webserver failure (sessions lost), a compute node (os reboots) and even live migrations of vms require a “stun” which stops the cpu for a period of time(a second?). These details I listed in parentheses are often omitted from the sales pitch. The reaction varies with each application. As the load increases on a system these adverse reactions can increase as well.

If you want to achieve 5 9s for your platform, you have to move the redundancy logic up the stack. Catch errors, wait and retry.

stack

Yes, use the tools you are familiar with lower in the stack. But don’t build yourself a nest at every layer in the stack, understand the big picture and apply pressure as needed. Just like you wouldn’t jump on every possible new shiny security feature, don’t jump on every redundancy feature to avoid nestfrastructure.

 

SQL Server Backup Infrastructure

What are your backups sitting on? How fast are restores? Compression? Dedupe? Magic? There are lots of questions to be answered before buying storage for backups.

What is your retention policy?

Space requirements vary greatly with compression and dedupe but nothing has a greater affect on space than the retention policy. If your legal department gets involved you may have “indefinite retention” on some or all of your backups. That means you can’t delete any backups. Better get that storage vendor on speed dial.

A more realistic retention policy would be 30 days of nightly backups. Another approach would be to keep a week of nightly backups, a month of weekly backups and a year of monthly backups.

What exactly are you sending over the wire?

Unless you are backing up on the same server or SAN, something is going over the wire. That wire is usually the bottleneck in a well tuned environment. A well tuned environment is actually shaped like a can than a bottle but you get my point.

A full backup would mean all data, or a copy of space used in your database is going over the wire. A differential would send only changed extents since the last full. Turning on compression reduces the size of these files by 50%-80% in my experience. SQL 2008 and up can natively apply this compression or you can use a 3rd party tools from Quest or RedGate to send less data over the wire.

EMC’s Data Domain Boost is not yet publicly available as far as I know but it’s worth mentioning. The generic data domain would be a full uncompressed copy of your data would have to go over the wire. That would be bad (http://www.brentozar.com/archive/2009/11/why-dedupe-is-a-bad-idea-for-sql-server-backups/). But with the addition of DDBoost, an integrated tool that is supposed to send only unique data over the wire, we have a possibly workable solution. This is slightly better than differentials which send change data over the wire, night after night until another full is taken.

Watch out for the simultaneous in/out.

One thing that cropped up and bit me in the arse was backups going in and out. This can happen in a couple different scenarios. For starters, heaven forbid you actually have to restore a database during your backup cycle. Can the drives and network support that operation? Or is your restore going to crawl?

Another time this can happen is if you are forced into backing up your backups. Say you have indefinite retention and backups have to be sent to tape. Depending on how fast things are, you might be reading and writing to disk at the same time. You might also be sending and receiving data over the wire at the same time.

Are you sending these backups offsite? If so that might be another opportunity to have multiple simultaneous ins and outs. If you tuned your system for only one operation at a time, you might want to rethink your RTO.

Scrubbing

Unless you are restoring your database and running checkdb, you have to assume your backups are not good. Scrubbing is the process of verifying the data written long ago is still good. Some appliances have this process built in so they can at least verify the bits that were written are still the same. A small problem can be blown up with dedupe or compression. Small problems in a backup file can cause restores to fail and then you will have to call in the experts in data recovery.

Reused tapes would frequently have errors. I don’t know anyone backing up to SSDs but early models had some problems failing. That said, good old fashioned, enterprise class HDDs make a good home for data. Adjust your scrubbing intensity with your experience. Make sure you are not causing more problems than solving. This process might be pushing a whole lot of IOPS to a shared storage system. Know who your neighbors are and don’t piss them off.

Throughput

I like to simplify a baseline throughput measurement into a simple clock time. A round of full backups took X amount of time. This translates well to the business who will be paying additional money so that you can achieve their RTO. That said when tuning the system we have to look at the throughput of each potential bottleneck.

The wire is generally the bottleneck. 1Gbps = 125MBps. Make sure you understand the difference in network terminology and storage terminology when it comes to bits and bytes of throughput. If you want to sound like an amateur, just say “MEG” or “MEGS” when the details really do matter. Your mileage may vary but I have not experienced 10x improvement when switching to 10Gbps from 1Gbps network ports and adapters. Tuning your MTU size across the network to support larger frames (aka jumbo packets) can help to utilize more bandwidth. Teaming multiple network cards can increase network throughput. Dedicating a backup network team of NICs can help with a busy SQL server that has users during the backup window.

I have experienced a RAID5 SATA configuration have the ability to write sequentially 500MBps. If you are not concerned about the simultaneous ins and outs, or potentially random workload from scrubbing, the storage cost can be really low. If you want offsite replication built into the storage product, costs will increase very fast. If you can do this with robocopy and little to no WAN acceleration, a simple pool of NAS drives could be a viable option.

Dedupe and compression can actually cause CPU contention on the SQL server if it is busy during the backup window. This is important to be aware of for test VMs. Test VMs might have a full set of data, but only 1CPU. This might be fine for users but it could be slowing down a backup stream. In a virtualized environment you may not want to kick off all of your backups simultaneously. Instead try to schedule backups in streams that you can adjust for the best performance. It is easier to setup and maintain streams than staggering start times.

Conclusion

I highly recommend a solution that gets a copy of your data off server, off SAN and offsite as quickly as possible. I suggest keeping 1 backup set onsite and 1 backup set offsite. This allows for fast restores and disaster recovery.

You may not have a superdome with 32 NICs but this is still a good read:
http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/Technical%20Case%20Study-Backup%20VLDB%20Over%20Network_Final.docx

 
Leave a comment

Posted by on November 7, 2013 in SQL Admin, Storage

 

SQL Parallelism and Storage Tiering

Sometimes features, independently acceptable on their own, can combine to produce peculiar results.

SQL

SQL parallelism is simply the query optimizer breaking up tasks to different schedulers. A single query can go parallel in several different parts of the query plan. There is a significant cost associated with separating the threads and then re-assembling them so not all queries will go parallel. To find out if queries are going parallel you can take a look at the plan cache.

Since GHz hasn’t increased in a long time but core count in going through the roof, it makes sense to have a controller thread delegate to the minions. CXPACKET waits will increase when queries have to go parallel. Missing indexes and bad queries can cause queries to go parallel.

The CXPACKET wait is incredibly complex. There are ways to make it go away without really fixing a problem. For example, setting max degree of parallelism (MAXDOP) to 1 will certainly make CXPACKET go away. Increasing the cost threshold for parallelism higher than the cost of your queries will also make CXPACKET go away. The goal isn’t to make CXPACKET go away. The goal is to make queries faster, not to fix the waits.

Storage

Recently, in my short career as a SAN admin I have been exposed to automatic storage tiering. With storage tiering we are taking pools of storage, with different performance characteristics, and attempting to spread the workload across the different pools drives. Ideally, the pool’s IOPS capability matches the demand for IOPS. Ideally, data that doesn’t get accessed that often will get put on slower cheaper storage. Ideally this reduces the need to identify archive workloads up front because the back-end storage solves most of that problem. Ideally management will buy enough storage so that everything isn’t running on disk pools with archive characteristics. My point is storage tiering doesn’t always work as well as advertised. Storage tiering is a cost saving maneuver which can cause a lot of inconsistent performance. Inconsistent performance leads to a lot of headaches.

If we combine these two features, some threads on that parallel query could be hitting cheap storage and other minion threads are hitting SSDs. The result is some threads are fast and other threads are slow. This will send CXPACKET waits into orbit. When CXPACKET waits are high, they generally mask any other types of system issues. There are many causes of CXPACKET waits and inconsistent storage performance could be one of them.

Parallelism is a feature to alleviate CPU bottlenecks. With storage tiering, the bottlenecks can quickly shift from storage to CPU and back. So the cost increase of going parallel can sometimes be for nothing if a single thread is waiting on storage.

Take Away


I apologize if you came here looking for some kind of recommendation. The fact is, storage tiering can be a nightmare. Performance troubleshooting is an ever changing game that I am fighting to stay ahead in.

 
Leave a comment

Posted by on October 29, 2013 in SQL Admin, Storage

 

Thin Provisioning and SQL Server

Thin provisioning is a broad topic that covers a style of allocating resources. That style is to allocate a small amount of actual resources to start and only reserve more resources as actual growth occurs. It is quite a popular topic in virtualization and SAN storage. One thing that came to mind as I read the VMWare storage guide about thin provisioning is SQL Server’s auto-grow feature.

Thin provisioning is a Ponzi scheme. If at any given moment all of the databases or VMs want all of the space they asked for, there won’t be enough actual resources to go around. But, as long as we have a large pool of “investors”, the growth will be predictable enough that we can accommodate all of our VMs and databases.

I’ve read best practices that strongly advise not to use the auto-grow feature. Pre-size the data and log file to make all the gremlins inside SQL happy. Even Microsoft recommends pre-sizing data and log files and only to use auto-grow in case of emergency. http://support.microsoft.com/kb/315512

With all these warnings, one might jump on the thick provisioning bandwagon. I’m not sure how much of these SQL warnings translate to VMs and SANs. If you don’t thick provision with accuracy, you are left with a giant mess of unused resources.

Thick provisioning. I’m not a fan. It requires math and a crystal ball. It’s hard to un-do thick provisioning that was poorly executed. How do you fix a database that is too small? Make it bigger, which can happen in an instant. How do you fix a database that is too big? Attempt some fragmenting shrink command that takes forever or attempt to import and export into a properly provisioned file.

It takes real expertise to translate how much space users really need. It’s not even really expertise, but more experience with the people who ask for space. Thick provisioning is a game that you will loose more than win. That is why I like to thin provision my databases. I suspect as a storage and virtualization admin I will consider thin provisioning my storage at those layers as well.

But maybe thin provisioning at all layers in unnecessary. Consider the following thick provisioned setup.

1 TB SAN LUN (Black)
1 TB VMWare datastore (Yellow)
1 TB Virtual Machine (Green)
39 GB C: Drive (Blue)
25 GB D: Drive (Blue)
750 GB E: Drive (Blue)
200 GB F: Drive (Blue)
Database Files (Red)

Provisioning_Storage

Thick provisioned, we use 1 TB of SAN storage on day one until the datastore is deleted. If that estimate of 1TB was just a shot in the dark and your database ended up only 100GB before you needed to upgrade the OS/SQL/Storage etc… you basically just wasted thousands of SAN dollars.

The options we have for thin provisioning are at the SAN level, VM level, at the drive level and at the database level (aka auto-grow). If we thin provision at the san level only we could potentially save zero space. Since the VM is built and the .vmdk files are pre-sized, the SAN may consider this space used. Just like a .mdf file, .vmdk files can be thick provisioned instantly(lazy) which doesn’t zero out the file, or slowly(eager) which does zero out the file. Unless you have some magical feature, the SAN will consider this used up and there will be no chance for savings if we don’t actually use the space.

If we thin provision the VMDK files there is some chance for savings. Almost all of the blue and green will be gone and we can fit a whole bunch more VMs in the yellow datastore.

If we thick provision the VMDK files with a lazy zero and thin provision the SAN LUN it may be able to reclaim the empty space inside of vmdk files. I’m not quite sure how well this works but will have to test it out. I’m guessing a quick format in Windows and instant file initialization in SQL Server could have some bearing on how much space we save.

With a few VMs, thin provisioning isn’t needed because we can spend the time to more accurately predict growth. Also, the pool is too small to predict growth on the fly. With a few thousand VMs, thin provisioning makes a lot of sense. That will make the growth curve steady enough for you to predict when more storage is needed.

 
Leave a comment

Posted by on July 22, 2013 in Storage

 

A new path

I started at my current employer over 6 years ago as a developer. 4 years ago I ventured on a new path into server administration and picked up the task of managing SQL Servers along with a bunch of other web and application servers.

I have enjoyed server administration very much, especially being a SQL DBA. Being responsible for the data ignited a passion for me to learn ALL teh things! I started with very little knowledge and had little guidance. I made the mistake of tip-toeing passively into being a DBA until an early case of corruption found it’s way to me. After that I started actively engaging all the instances. I crawled my WMI queries into every last corner of the network and started auditing and monitoring the backups. I started attending user groups and conferences. I started blogging and speaking in hopes to contribute to the community that helped me get where I am today.

I became very interested in disk and virtualization performance in an effort to get the most out of SQL Server. I have heard of some DBAs not getting along with their SAN admins but that isn’t the case where I work. I believe we have worked very well together to overcome some very challenging issues.

The growth in our server count and storage requirements had opened up a very obvious need for more help with virtulization and storage administration. I made it clear to my management that I was interested in these things and then I waited. I waited for quite some time. I waited until they apparently got desperate enough to finally give in and let me become a storage and virtulization administrator =]

I’m happy to say I will maintain a role in SQL administration. I won’t be quite as active because I am way over my head in these new tasks and must once again learn ALL teh things!

I’ve started with ESXi and vCenter Server Product Documentation: vSphere Storage Guide found over here http://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html

Also, later next month I will be attending VMWorld in San Fransisco. I hope I can absorb enough information before then so the content at the conference won’t fly right over my head. I am getting the feeling that this new path will increase my urge to write so stay tuned.

 
Leave a comment

Posted by on July 13, 2013 in SQL Admin, Storage, Virtual