Monthly Archives: July 2013

Thin Provisioning and SQL Server

Thin provisioning is a broad topic that covers a style of allocating resources. That style is to allocate a small amount of actual resources to start and only reserve more resources as actual growth occurs. It is quite a popular topic in virtualization and SAN storage. One thing that came to mind as I read the VMWare storage guide about thin provisioning is SQL Server’s auto-grow feature.

Thin provisioning is a Ponzi scheme. If at any given moment all of the databases or VMs want all of the space they asked for, there won’t be enough actual resources to go around. But, as long as we have a large pool of “investors”, the growth will be predictable enough that we can accommodate all of our VMs and databases.

I’ve read best practices that strongly advise not to use the auto-grow feature. Pre-size the data and log file to make all the gremlins inside SQL happy. Even Microsoft recommends pre-sizing data and log files and only to use auto-grow in case of emergency.

With all these warnings, one might jump on the thick provisioning bandwagon. I’m not sure how much of these SQL warnings translate to VMs and SANs. If you don’t thick provision with accuracy, you are left with a giant mess of unused resources.

Thick provisioning. I’m not a fan. It requires math and a crystal ball. It’s hard to un-do thick provisioning that was poorly executed. How do you fix a database that is too small? Make it bigger, which can happen in an instant. How do you fix a database that is too big? Attempt some fragmenting shrink command that takes forever or attempt to import and export into a properly provisioned file.

It takes real expertise to translate how much space users really need. It’s not even really expertise, but more experience with the people who ask for space. Thick provisioning is a game that you will loose more than win. That is why I like to thin provision my databases. I suspect as a storage and virtualization admin I will consider thin provisioning my storage at those layers as well.

But maybe thin provisioning at all layers in unnecessary. Consider the following thick provisioned setup.

1 TB SAN LUN (Black)
1 TB VMWare datastore (Yellow)
1 TB Virtual Machine (Green)
39 GB C: Drive (Blue)
25 GB D: Drive (Blue)
750 GB E: Drive (Blue)
200 GB F: Drive (Blue)
Database Files (Red)


Thick provisioned, we use 1 TB of SAN storage on day one until the datastore is deleted. If that estimate of 1TB was just a shot in the dark and your database ended up only 100GB before you needed to upgrade the OS/SQL/Storage etc… you basically just wasted thousands of SAN dollars.

The options we have for thin provisioning are at the SAN level, VM level, at the drive level and at the database level (aka auto-grow). If we thin provision at the san level only we could potentially save zero space. Since the VM is built and the .vmdk files are pre-sized, the SAN may consider this space used. Just like a .mdf file, .vmdk files can be thick provisioned instantly(lazy) which doesn’t zero out the file, or slowly(eager) which does zero out the file. Unless you have some magical feature, the SAN will consider this used up and there will be no chance for savings if we don’t actually use the space.

If we thin provision the VMDK files there is some chance for savings. Almost all of the blue and green will be gone and we can fit a whole bunch more VMs in the yellow datastore.

If we thick provision the VMDK files with a lazy zero and thin provision the SAN LUN it may be able to reclaim the empty space inside of vmdk files. I’m not quite sure how well this works but will have to test it out. I’m guessing a quick format in Windows and instant file initialization in SQL Server could have some bearing on how much space we save.

With a few VMs, thin provisioning isn’t needed because we can spend the time to more accurately predict growth. Also, the pool is too small to predict growth on the fly. With a few thousand VMs, thin provisioning makes a lot of sense. That will make the growth curve steady enough for you to predict when more storage is needed.

Leave a comment

Posted by on July 22, 2013 in Storage


A new path

I started at my current employer over 6 years ago as a developer. 4 years ago I ventured on a new path into server administration and picked up the task of managing SQL Servers along with a bunch of other web and application servers.

I have enjoyed server administration very much, especially being a SQL DBA. Being responsible for the data ignited a passion for me to learn ALL teh things! I started with very little knowledge and had little guidance. I made the mistake of tip-toeing passively into being a DBA until an early case of corruption found it’s way to me. After that I started actively engaging all the instances. I crawled my WMI queries into every last corner of the network and started auditing and monitoring the backups. I started attending user groups and conferences. I started blogging and speaking in hopes to contribute to the community that helped me get where I am today.

I became very interested in disk and virtualization performance in an effort to get the most out of SQL Server. I have heard of some DBAs not getting along with their SAN admins but that isn’t the case where I work. I believe we have worked very well together to overcome some very challenging issues.

The growth in our server count and storage requirements had opened up a very obvious need for more help with virtulization and storage administration. I made it clear to my management that I was interested in these things and then I waited. I waited for quite some time. I waited until they apparently got desperate enough to finally give in and let me become a storage and virtulization administrator =]

I’m happy to say I will maintain a role in SQL administration. I won’t be quite as active because I am way over my head in these new tasks and must once again learn ALL teh things!

I’ve started with ESXi and vCenter Server Product Documentation: vSphere Storage Guide found over here

Also, later next month I will be attending VMWorld in San Fransisco. I hope I can absorb enough information before then so the content at the conference won’t fly right over my head. I am getting the feeling that this new path will increase my urge to write so stay tuned.

Leave a comment

Posted by on July 13, 2013 in SQL Admin, Storage, Virtual


Encryption and Decryption for the Web Application

Missy Elliot Encryption

Encryption is more about the implementation of the algorithm than the algorithm itself. My recent research has made me come to the conclusion that AES 256 is good choice for Symmetric key algorithm. I have veered away from 3DES because of some claims of its weakness.

Lets say we need to store SSN, and retrieve SSN, and there is no way around it. I need to to run a credit report and then need it again to file some kind of government form on a regular basis. The business unit says I can’t just keep asking for this data so I have to store it.

I’m going to make an assumption here that we want to keep this data safe, not just because of some compliance requirement… but because we care about our customers data. First lets follow the data along it’s theoretical path with some potential exposures. Think of a GPS application where you enter your destination and it uses your current location to define a route. This is what I call the data route.

Defining the Data Route

Call center asks customer over the phone SSN

    prism! Just metadata you say? Ok phew.
    calls are recorded for quality control

Call center agent or customer enters the SSN into the browser based form

    key loggers
    screen captures
    browser saves form data for your convenience

Browser packages this up and sends it over the wire to the server

    network sniffers
    browser history(you wouldn’t put this in a URL param but just in case…)
    performance monitors
    intrusion detection systems

Web Server application processes the data

    application dump
    data read from unprotected memory by another process
    code persists data to disk(copy paste)
    hibernation file is snagged with data from memory
    server low on memory and pages data to disk(copy paste)
    debugging tools decompile code

Web Server sends data over the wire to the database

    network sniffers
    performance monitors
    SQL Traces
    parameters are stored with query plans

Database persists the data

    Select * from ssn
    copy & paste .mdf
    data is pumped out to reporting applications
    data is given to developers for debugging
    prod is copied to Quality Assurance servers
    SAN mirrors data offsite
    Rouge DBA or other Admin

Database Backs-up the data over the wire to a file share

    all that same network goodness
    copy/paste the file share

Ok, scared now… lets talk about encryption

To simplify the matter, there are three pieces needed to encrypt/decrypt

1. The Data (a single cell)
2. The Key
3. The Algorithm

Lets make a small distinction right off the bat. These three things live in memory at the time of the encryption/decryption. That doesn’t mean they have to be persisted to disk in that same location. Here is a good discussion on the matter of key storage You can shred up your key and store it in several locations but just know you have added several single points of failure. Single points of failure can cause data loss, not just for the hacker but for your business.

The goal of our encryption is to prevent those who have gained access to the data, still not be able to read the data. It’s an additional layer to prevent as many of the vulnerabilities above as we can. From a security standpoint, encrypting at the earliest possible point in the data route would considered a best pratice. Unfortunately those methods are not always feasable. Your users would hate you if you implemented a custom keyboard.

encryption keys

Three potential solutions

Option A: We write code to do it on the web servers.

Another pretty thorough sample.

This option has more pros than cons. We are pretty far up the data route so if we chose SSL to protect the data over the internet, we could then encrypt the SSN on the web server and the rest of the route would be protected. Encryption is CPU intensive so a web server that scales out well is a good choice for this process. One con is that web servers are usually exposed to a larger audience of malicious type folks.

Option B: We change SQL Statements and do it on the SQL Server.

This requires changing queries to include a passphrase as a parameter. I’m not a fan since the encryption key is persisted stored on the same host as the data. It does add the benefit of not allowing db_datareader access to plaintext data.

Option C: We make minor changes and utilize TDE on the SQL Server.

This is an option that may allow an auditor to check some proverbial box. It can add anywhere from 5-25% overhead and encrypts the entire database as it rests on disk. db_datareader can still read plaintext SSNs so not much protection. If an attacker copied the .mdf, they generally have access to copy the keys too. It might help with the backup security vulnerability.

Other Points
Consider a hacker has compromised one of your customers SSNs from, I don’t know… maybe the government of South Carolina. With the name and SSN can the hacker reverse engineer your algorithm to figure out the key and compromise the rest of your customers data?

Key rotation is a good idea. If you realize your key has been compromised, you will want a well documented way to change that key. You may need to decrypt and re-encrypt all of your data with the new key.

Algorithms are usually public knowledge and built into high level languages such as Java and .NET. Salt is a good way to mask the algorithm and key.


Application layer encryption add a vast amount of protection to the data on the database server. DBAs no-longer have dnclip enabled for all of your customers data. This provides separation of duties and separation of the three key pieces of the puzzle.

Leave a comment

Posted by on July 1, 2013 in Security