database throughput through the virtual layer

26 Sep

VMs or virtual machines are all the craze, and for good reason. When you RDP to a VM you are connecting to the guest. This guest OS looks just as it would if it were installed directly on the hardware without a virtual layer. Many guest VMs can reside on a single physical host. Some new large servers advertise hosting 100 VMs. So instead of one gigantic powerful instance you can now have 100 instances of all different sizes.

Throughput is the most important measurement for data warehousing or OLAP type environments. Large amounts of data is written to and read from these environments in a sequential fashion. When considering throughput make sure you know the conversions of bits and bytes such as Gbps and MBps. A stereotypical SAN guy only thinks in IOPS but make sure you tell him NO, we’re looking at bandwidth. Tell him (or her…) a 1 MB IO will make MBps = IOps. Hardware bottlenecks that can come into play are:

1. Virtual host layer: the hypervisor can give you a ton of metrics and configuration options, be thorough with your driver updates and research.
2. Processors: each core generally pushes 200MBps, VMWare encourages scaling out with their 8 core limitation.
3. PCIe: 1 lane can push 500MBps so make sure you are getting a few 8x or one 16x slot, then you can be sure its not your bus.
4. HBAs: these cards plug into the PCIe slot and are measured in Gbps, make sure you find the “negotiated speed”. Blade servers generally aren’t chosen for datawarehouses because of this bottleneck.
5. ISL: inner switch link, watch this for too much % utilization
6. SAN switch: the last line before your SAN
7. Storage processor: Comes into play when hitting SAN cache
8. San inter-module switches: may come into play if striping across modules in your SAN
9. Disk: iSCSI, FC etc. Each disk has its limits so make sure you have enough disks striped and mirrored to support the previous pieces

Now that we’ve walked the lines to your data let me touch on two software technologies. Windows MPIO and VMware round robining. The SAN Fabric has many paths it can take to get to the end. These two software technologies offer a way to utilize all the available paths. Windows multipathing is more advanced than vmware’s option and can work better with some SANs. You may want to consider putting your large objects in files that are striped across several LUNs. This will allow you to utilize several mirrored pairs of disk #9 for more throughput.

It is rare that anything is ever as good as advertised. This is true for throughput measurements. To figure out if you are getting what you are told use a handy tool called SQLIO from Microsoft. It is a simple .exe that has a handful of important parameters:

1. IO size in kB with -b: knowing what your SAN is good at can be helpful. For a SQL dw test use 512 for this parameter
2. param.txt with -F: path to a file that has a list of files to do your testing. Do a test with a lot of small files to test your SAN cache and do a test with a lot of large (2x your cache) files to test your disk.
3. io style: random or sequential, choose sequential for a SQL dw test
4. -k reads or writes: use W to test your SAN’s write caching ability, some SANs also have advanced read ahead features for large sequential reads
5. -s seconds to test: large test files take a while to build (hours). small tests over and over again are good but you might want to do a test for a few days to uncover any contention issues.

Most of my information is coming from the fast track program.

The idea behind fast track is not to over buy any one component, keep costs down and throughput through the roof for data warehouses. I haven’t talked much about the virtual layer but there are a few things to keep in mind:

1. correct # of paths and round robining: ask your SAN vendor about the # of paths
2. queue depth for vmware, also check queue depth of your HBA in the bios
3. IO drivers: make sure your SAN is VMware friendly
4. contention/ contex switching: some SANs are good at large sequential reads so don’t have 2 VMs interrupting each other
5. if you need more than 8 cores you are SOL for a while with VMWare, most fasttrack systems start at 16 cores.

If you checked out the fasttrack link you will notice some of the pre-tested architectures and their throughput measurements. Check the windows (guest) perfmon counter Disk Bytes/sec to line up with this value. I have witnessed that counter reach 3.7GBps so I know its possible. And yes that is a big B.

Leave a comment

Posted by on September 26, 2010 in Network Admin, SQL Admin, Virtual


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: