RSS

T-SQL Tuesday #065 – Slowly Changing Dimensions

tsql2sday150x150_thumb_2aa4ea0f I’ve been focusing a lot of my study time on data warehousing lately. I’ve been supporting the system and storage of data warehouses for a while but lately have been digging into the developer topics.

What I learned over the weekend is how to build a working, slowly changing dimension in SSDT. Thanks for the challenge #tsql2sday and @SQLMD!

 

The Problem

Dimensions are the tables we design to make data look good in a pivot chart. They are the tables that describe our facts. Customer is a good example of something that could be a dimension table. For my challenge I decided to use virtual machine as my dimension.

The problem is, what if a VM’s attributes change? 4 cores, that was yesterday.. today PRDSQLX has 24 cores. What if someone deletes a VM, how many cores did it have?

I can get the current status of my VMs by using the source system, but the problem is the history. I can pull a snapshot of what VMs I have in my environment every day from the source system. I could just make copies of that data and slap a “PollDate” column on the table. Viola, I have everything I need, and about 1000x more than I need.

There is the problem, how do I collect and save a history of my VM’s attributes?

Solution

Each column in my VM table can be of 3 basic types http://en.wikipedia.org/wiki/Dimension_table

Type 1. Simply overwrite this value… it changes a lot and I don’t care about history (eg. what host is the VM running on)
Type 2. add a new row to maintain history… if one column in my VM row changes, I get a whole new record in my dimension
Type 3. add a new column to keep a limited amount of history… add some columns like previous_num_cpus and previous_previous_num_cpus and move data to that as it changes

So we have to take the data we get on a nightly snapshot of the source, and compare it to what we have in the destination, then do a conditional split. I’m sticking to handling these differences:

New VM – insert with NULL validto (easy)
Deleted VM – change validto column (create staging table and do an except query)
Change in Type 1 Col – update existing VM row with NULL validto column, (easy)
Change in Type 2 Col – insert new row with NULL validto column, change previous record’s validto date (a little tricky)

That logical split can be made easier by using the Slowly Changing Dimension task in SSDT. It pops up a wizard to help you along the way and completely set you up for several failures which I am going to let you learn on your own :]

Step 1. Setup an initial loading package.

This will make it handy to restart your development.

Query the source in a data flow OLE DB Source
Tack on a few extra columns, validfrom, validto, isdeleted, sourcesystemid in the SQL command
create the destination table using the new button ( this is pretty handy to avoid manually lining up all datatypes )
use the new button again to create a dimVM_staging table for later
Add the task at the beginning of the control flow to truncate destination or dimVM table
Run the package and be careful not to accidentally run it since it has a truncate

Step 2. Create this monstrosity

Control Flow
dimVM_scd_control_flow

Data Flow
dimVM_scd_data_flow1

It is actually not too terribly bad. When you add the Slowly Changing Dimension a wizard pops up and when all the stars align, all the data flow transformations and destination below are created.

If we focus on the top of the data flow first, it is easy to see I am pulling from two source systems and doing a union all. The interesting problem I had to solve was the deleted VM problem. The wizard didn’t do that for me. I knew if I had the staging table, I could compare that to the dimVM to see if anything was missing. If you want to find out what is missing, use an EXCEPT query. Once you find out what is missing (deleted VMs) we can update the validto field effectively closing up shop on that row but keeping the history of rows relating to that VM. I decided to add the isdeleted column to make it easier to find deleted VMs. This code is in the SQL Script task on the control flow.

update dimVM
set dimVM.validto = getdate(), dimVM.isdeleted = 1
from dimVM
inner join (
select vmid,vcenter from dimVM
where validto is null
except
select vmid,vcenter from dimVM_staging
) del
on dimVM.vmid = del.vmid and dimVM.vcenter = del.vcenter

One last little tidbit. If you make any modifications to the transformations that the SCD wizard created, you should document them with an annotation. If for some reason you have to get back into the wizard, it will recreate those transformations from scratch… ironically not maintaining any history.

Step 3. Profit

I hope you enjoyed hearing about my new experiences in the Slowly Changing Dimension transformation in SSDT.

 
Leave a comment

Posted by on April 14, 2015 in SQL Dev

 

Final Preparation for 70-463 Implementing a Data Warehouse with Microsoft SQL Server 2012

This is a continuation of this post

Two fellow bloggers have been posting more quality information on this test.
http://colleenmorrow.com/

http://www.desertislesql.com/wordpress1/?p=243

When reading the book I skipped over all of the practice sections. I did read the exam tip sections inside of the practice but never actually practiced. I don’t have a lot of hands on experience with SSIS and even less with mds/dqs. I spent about 9 weeks making through the book while skipping the practice and most of the reviews. I probably would have needed an additional 18 weeks to properly make it through all of the practice or lab type sections of the book. Learn one, do one, teach one is my favorite method to mastery but with 2nd shot deadline, I didn’t have a lot of time to prepare.

To supplement, I attempted to find videos on youtube and watched videos on the Microsoft Virtual academy. Both sources were not very demo heavy. What I did find is CBT nuggets that give a 7 day trial. The 70-461 videos that I was able to watch were very high quality, fast paced and demo heavy. This is exactly what I needed at this time. I’d recommend a membership if you have a bundle of money burning in your pocket.

Since my trial was up I decided to type up my CBT nugget notes.

CBT connections managers
control flow -> doesn’t involve data
bottom level are private connection managers, a.k.a package level
right solution explorer is project level connection managers which are global
you can enable/disable sequence containers
precedence constraints, go to properties to define AND or OR logic
copy-> paste package connection managers
delay validation -> doesn’t check structure
email doesn’t have a port option but could purchase add-ins or write your own
fix for NULLs is COALESCE

Data Flow
rows, buffers, pipeline,transformations
raw file -> ssis only -> good for sharing data between packages
raw file -> good for resuming packages
recordset->variable used to loop through
for performance, aggregate at the source since that is blocking
import export col -> for blob data
term matching is like CTRL+F
blocking tasks take lots of memory -> sort, aggregate
partial-blocking -> merge chuncks

Data Quality services
cleansing matching
server is 3 databases
dqs client is used for creating KBs
creating a knowledge base
-open xls sheet -> job title list for KB example
-KB needs a domain, circle with * button is domain
State length of 2 is an example domain rule
composite domain (EX: address which includes city state zip)
reference data source RDS (ex: mellisa data for addresses)
KB’s get published
activity is automatically logged

Implementing DQS
data profiling task in SSDT
-profile types
–null ratio request
–pattern generator RegEx for formatting
–column statistics
-then specify column
Quick profile: runs against all columns
Open data profile viewer
Cleansing
suggested confidence level
corrected confidence level
DQS cleansing task
Job title source job_title _output
jobtitles table
matching
newKB->domain->source column (survivor record)
the table with the + button to add a rule and use the Rule Editor

Implementing MDS
proactive management
people place concepts or things
non-transaction data is good for MDS
includes auditing and versioning
MDS Componenents(Database, config mgr, MD mgr, web service, mds model deploy, excel Add-In)
MDS Objects(Models: the container db, Entities: like tables, Attributes: like columns, Hierarchies, Members: Actual data)
Install requires powershell 2.0 and IIS 7.5, silverlight and a database
has integration with DQS
to deploy packages that contain data must use CLI (deploynew -package “” -model)

Data flow
merge join requires sort -> advanced editor, pick isSorted and the column
MetaData problems: double click on flow and change types
Lookup transformation
-cache connmgrs for re-use
–redirect rows
–multi output popup
slowly changing dimension task (wizard)
fixed attribute fail on change
changing attribute type 1 overwrite type 2 new records (history)
inferred member flag goes in dimension
blocking oledb command
redirect error rows to flat file

executing packages
“$Project::ProjectParam(String)”; test@email.com
LOGGING_LEVEL 3 = Verbose
dtexec.exe is fire and forget style
xp_cmdshell
built-in SPs in ssisdb
catalog.set_obj_param value
restartable packages
-checkoint file
-tracking last successful step in control flow
project properties
-select file name
-set usage never
–if exist
-save checkpoints = true
-set property fail package on failure = true
to test, can set task property to force a failure

 
Leave a comment

Posted by on April 9, 2015 in SQL Admin, SQL Dev

 

More Preparation for 70-463 Implementing a Data Warehouse with Microsoft SQL Server 2012

This is a continuation of my previous post. This is just some very quick notes that I am posting for my benefit and so that readers may get an idea of the preparation necessary for this test. They are my notes from this book: https://www.microsoftpressstore.com/store/training-kit-exam-70-463-implementing-a-data-warehouse-9780735666092

PART II: Developing SSIS Packages

simple data movement – can use import export wizard
complex data movement – SSDT
SSDT is visual studio shell used to develop IS,AS,RS projects
Control Flow connection managers can be package or project scoped
Connection manager types:
ADO – backwards compatibility
ADO.net – compatible with sql server
AS – analysis services
File – SSIS data type
Flat file – delimited file
ftp – security option is only basic auth
http – web services or file, no windows auth
OLE DB – sql server, will be removed in favor of ODBC
ODBC – open database connection
SMTP – basic email auth only

package scoped connection managers will override the higher level project scoped connmgrs

control flow tasks and containers
containers help control execution of tasks
transformations include
cleansing – remove invalid data or unwanted data
normalization – XML value to varchar
conversion – byte[] to varbinary(max)
translation – “F” to “Female”
validation
data calculation and data aggregation
data pivoting and data unpivoting

ssis tasks categories, data prep, workflow, data movement, SQL admin, SQL maintenance

containers, for loop, foreach loop, sequence

Precedence Contstraints ( the arrows that come off of tasks)

success, failure, completion
dotted lines mean OR and solid means AND logic used when multiple tasks are involved in flow

Designing and Implementing Data Flow

Data Flow is a level deeper than the control flow
Control flow triggers data flow
data flow task builds execution plan from data flow definition
data flow engine executes the plan
*Validate external metadata – checks for existance of tables and objects and should be turned off if they are dynamically created
builk OLEDB = fast load
ODBC = batch
fast parse is available at the column level on some data types ( date, time, int )
Working with data flow transformations
-Blocking (ex: sort, aggregate) transformations that read all data in before passing any rows down the pipeline
-Non-Blocking -> lookup, multicast, conditional split or other row-by-row transformations
-partial-blocking -> merge, merge join, union all, data flows in chunks
cache transformations – good for multiple transformations on same data
import/export col – good for blobs
character map – upper case, lower, linguistic bit operations
advanced data prep: dqs cleansing, oledb command, slowly changing dimension, fuzzy grouping, fuzzy lookup, script component(custom.net)
#NEW# Resolve references editor helps resolve mapping problems
Lesson 3: strategy and tools
lookup transformation caching
how to handle rows w/ no matches
sort is expensive, optionally perform sorts at source and use advanced editor to mark data as sorted
avoid update and delete on fact tables
do large table joins on database layer
do updates on loading or temp tables in set based sql operations
Chapter 6: Enhancing Control Flow
ssis variables and parameters
avoid retrieving external source variables more than once
parameters are exposed to the caller bu variables are not
parameters are read-only and can only be set by the caller
variables are helpful to reuseability
variables are user defined or system
variables can store rows foreach enum containers
-avoid storing large rowsets in memory/variables
variable data types
-object: last resort
Int16: -32,768 thru 32,768
UInt16: 0 thru 65,535
UInt32: 0 thru 4,294,967,295
DateTime
String
Char: 65,353 unicode
Decimal: 28 or 29 significant digits
Single
Double
Variable Scope
-Package Scopre
—-Container Scoped
——–task scoped
property parameterization
explicit assignment
expressions
lesson 2: connmgr, tasks, and precedence constraint expressions
expression: combination of constants, variables, parameters, column refs, functions, and expression operators
-special ssis syntax close to C++
math functions: ABS, EXP, CEILING, etc…
String Functions: FINDSTRING, HEX, LEN, LEFT, REPLACE
precedence constraints can use AND/OR logic expressions
Lesson 3: Master Pakcage
just a normal package that uses the execute package task
use variables to expose results to parent
use project deployment model to make parameters available to child packages
use project scoped parameters
CHAP7: Enhancing Data Flow
Lesson 1: Slowly Changing Dimesions
-late arriving dims or early arriving facts
–1. insert row into dim, mark inferred… requires bit col
–2. use newly created surrogate key
–3. when loading dim overwrite inferred members
TYPE 1 SCD: overwrite
TYPE 2 SCD: keep all history
can use conditional split to see what columns changed
ex: source.fullname dest.fullname
using t-sql hashbytes can compare for changes
–then two cols for hash val Type1 & type2
use set based updates instead of wizard
Lesson 2: preparing a package for incremental load
dynamic sql
change data capture
Dynamic SQL in OLEDB source
1. select dataaccess mode of sql command and use ? to pass parameter
2. pass variable to sql command and use expressions to modify the sql string
cdc functionality – cdc source and cdc splitter
-ALL, ALL w/old, net, netw/update mask, net w/merge
lesson3: error flows
route bad rows – fail, ignore (copies null), redirect rows
chapter 8: creating robust and restartable packages
can set transactions at package control flow or task level
transactions use msdtc
transaction options are: required, supported, not supported
transactions work on control flow not data flow
can nest a not supported execsql that won’t rollback inside a transaction (ex: still want to audit on fail)
lesson2: checkpoints
save checkpoints need turned on, on package
creates a file and restarts if exists
starts from begining if not exists
lesson3: event handlers
onerror
onwarning
onvariablechanged
ontaskfailed
can turn event handlers off for task
chapter 9: implementing dynamic packages
project level and package level connection mgrs and paramters
must be deployed to ssis catalog
parameter design values are stored in the project file
cannot change parameter value while package is running
property expressions are evaluated on access
lesson2: package configs
enable package deployment model
can get parent package configs
chapter10: auditing and logging
logging: package configuration
auditing: dataflow trnasformation component
lesson1: logging packages
providers are: txt file, sql profileer, sql server, event log, xml
boundry progress exception
use parent setting is default
ssis control flows can be configured for logging
lesson2: auditing and lineage
elementary auditing – captures changes
complete – adds usage or read activity
audit transformation editor
lesson3: preparing package templates
keep packages in source control

Part IV: managing and maintaing ssis packages
ssis service is required in production
ssisdb new
package install utility is legacy
can use ssdt or ssms to deploy packages
project model or package model
dtexecui is legacy
can use TSQL, powershell, manual dtexec cli to execute packages
agent to schedule packages
introduced master package concept
securing packages: uses sql security concepts of principals and securables
ssis_admin role
ssis_user by default allowed to deploy, and deployer is allowed to read, modify, execute
Chapter 13: troubleshooting and perf tuning
breakpoints work only in control flow
breakpoints and fire on a hit count
data viewers on path will show grid view of data
use error outputs to catch bad rows
test with a subset of data
basic logging is default
switch to verbose when there are problems
data taps are like dataviewers for production
must be predefined using catalog.add_data_tap for specific data flow
lesson2: perf tuning
buffers are a group of data in data flow
determined automatically
Transformation Types
-non-blocking: row based synchronous
-partial blocking: asynchronous transformation
-blocking: asynchronous
backpressure controls flow for best memory control
max buffer rows – 10,000 default
max buffer size – 10MB by default
fast load on destination
full-cache lookups
avoid oledb transformations
BLOBs get swapped to disk
data flow engine threds
max concurrent executables -1 = # of logical processors +2
perfmon counter: buffers spooled

PART V: Building Data Quality Solutions

chapter14: installing and maintaining DQS
Soft dimensions: timeliness, ease of use, intension, trust, presentation quality
hard dimensions: accuracy, consistancy
Schema dimensions: completeness, correctness, documentation, compliance w/theoretical models, minimalization
activites: understand sources and destinations
lifecycle
security and backups managed through ssms
Chapter15: implementing MDS
metadata, transactional, hierachical, semi-structured, unstructured, master
MDM goals: unifying or harmonizing, maximize ROI through reuse, support compliance, improving quality
MDM: coordinated set of tools policies to maintain accurate master data
map master data dimensions to DW
Installing MDS: DB, Service(Needs IIS), Manager, Excel Add-IN
Creating MDS model
1.Model
2.Entities(like tables)
3.Attributes(like columns)
4.Hierarchies
5.collections
Derived hierarchies: Recursive with TOP = NULL (ex: Org Chart)
Explicit Hierarchies – Organization can go any way
Collection: flat list of members
MDS service performs business logic
Chapter16: managing master data
MDS Packages
-Model deployment package to move data to another server
-wizard only includes meta data
-permissions are not included
-MDSModelDeploy command prompt if you want to move data
exporting – tsql on subscription views, webservice
Security, system admin (one user, tsql to change), model admin (complete model access)
entity permissions apply to all attributes
mds add-in for excel (connect to to http://server:8080)
when model and member permissions are overlapping read-only > updated and deny > *
excel add-in can use DQS KB matching
Chapter17: creating a data quality project to clean data
knowledge disovery
domain managment
reference data services
matching policy
domain: semantic representation of column
properties: data type, leading values, normalize, format, spellchecking
Term basic relation: Inc. -> Incorporated

I skipped 18,19,20: Advanced ssis and data quality topics because only 5 parts are listed on the exam prep and I ran low on time.

 
1 Comment

Posted by on April 8, 2015 in SQL Admin, SQL Dev

 

Preparation for 70-463 Implementing a Data Warehouse with Microsoft SQL Server 2012

I’m writing this post to force myself to spend some quality time with the materials for this exam. I have been at it for almost two months now and am nearing my exam date. I accelerated my plan so I could get into the 2nd shot window offered by Microsoft and also so I could finish my MCSA within 1 year. It has been a battle at times and is not an easy certification to get. Microsoft has really increased the difficulty since the MCITP for SQL 2008 which only required 2 exams.

My employer is assisting with the costs in a few ways. They will reimburse me for the cost of a passed exam. They are giving me a $500 bonus if when I pass all three exams and prove my MCSA. And they have loaned me the Training Kit book along with the other tests books that I have already returned.

My plan has been going fairly well. I’ve been able to put at least 10-15 minutes in about 6 days a week. Some of those have lasted and hour or more but that is pretty rare. Data warehousing is interesting to me because we have a lot of things starting up at work that may take off and require these skills. Before I started studying I had deployed only a few packages for my own small data collection and reporting tasks as an administrator. I also do not get too involved with database design since we rely on a lot of 3rd party applications. That world is changing for me and that is why I have been able to be a fairly good student for this last test.

So lets get to my plan.

The percentages are the first thing to note on this page: https://www.microsoft.com/learning/en-us/exam-70-463.aspx

11% – Design and implement

23% – Extract and Transform

27% – Load

24% – Configure and deploy SSIS

15% – DQS

_______

100%

I like to sit down with the book and read as much as I can while taking notes. I write down a lot. When I look at it later I think, “duh I knew that why did I write it down?” But it actually helps me stay focused. Even if I just write down the title of the section, it keeps me on track. At this point, I am ready to go back and review a lot of those notes and type them up so here they are.

The book is split out into those same 5 “Parts” as listed on the exam website.

Part 1: Design and Implement
Use snowflake in a POC since it will be easier to design from the complex OLTP environment.
Star schema for everything else.
Star is just a simplified, denormalized, merged, cleansed, historical schema with fewer joins
Star schema works well for SSAS cubes, SSAS won’t be on the test (phew).
A fact is: “Cust A purchased product B on date C in quantity D for amount e”
Dimension table: Customer, Product, Date
One star per business area
The Granularity level is the number of dimensions or depth you can slice by (thinks sales by quarter or sales by day)
Auditing: Who, What, When
Lineage: Where is the data coming from?
Dimensions: The goal is to make it look good in a pivot chart
-descretizing: putting values into bins and not keeping too much granularity because it doesn’t graph well
-Member Properties: columns not used for pivoting
Slowly changing: type 1- no history, overwrite; type 2 – keep history with current flag or validto-validfrom cols; type3 – limited history with additional cols like prevAddr
Keep business keys intact, create additional DW specific keys (surrogate keys) for linking fact to dimensions, probably INDENTITY
Use a SEQUENCE if you need to know the number before inserting, request multiple at once, or need a multi-table key
FACT TABLES: made up of FKs, Measures, Lineage cols, Business keys
consider the additivity of measures. EG: can’t sum an AvgDiscCol
Fact tables should be on the Many side of the 1->many relationship
Dimensions contain the lineage data
Age is a common computed column
design dimensions first, then fact tables
use partitioning on your fact table
Fact tables contain measures
Every table should have a clustered index
Do not index FKs of fact table because HASH joins dont need it?
If you are doing merge joins and nested loop joins indexes on FKs help
indexed views are useful in some cases
Row/page compression automatically applies unicode compression
batch mode is faster and will show in the query plan
column store indexes: one per table, not filtered, not on indexed views
Partitioning function maps rows to a partition
partitioning scheme maps partition to filegroups
aligned index: table with same schema which allows for partition switching
optimizer can eliminate partitions
inferred member: row added in dimension during fact table load

PART II: Developing SSIS Packages
To be continued…

 
1 Comment

Posted by on April 2, 2015 in Uncategorized

 

RocketTab Must Die

RocketTab is spyware that is passing itself off as adware. It proxies your http and https connections to the internet and injects boatloads of garbage ads into legitimate websites. This is hijacking with a lousy excuse of making your search “better” by modify your top search results. It is buggy which causes errors in browsing and is dangerously similar to the Superfish software that Lenovo was placing on its PCs. This method of MITM attacking to push ADs must die a painful death.

I’m not sure I like the ad supported direction that media is going. I’m also not sure I like paying for things either… and yes I understand the contradiction. What I am sure about is we need to scale the ads and general invasion of privacy back a notch or three. This software is getting installed without users understanding of what is happening. It is spawned from greed and lousy, immoral business practices.

I pay for Netflix, I rent movies, I go to the theater, I watch adds and I am ok with the collection of my viewing history for the sites that I intend to go to BY the sites that I go to like YouTube and Hulu. But I have recently had a first hand experience with this garbage called RocketTab.

That Dirty, Disgusted Feeling

I went for a trip to visit my mom and hopped on her computer because I forgot to set my out-of-office responses. I opened an incognito window and logged into my personal email and then was about to log into my work email when I noticed something strange.

cert

That is definitely not the issuer of my work’s public webmail certificate. Fiddler is actually perfectly legitimate web debugging software. So am I correct in thinking that these lazy sloth developers of crapware reused the Fiddler certificate?

Normally, if the HTTPS part is green I don’t bother checking the certificate. For some reason we were just talking at dinner about Lenovo and their missteps so I got curious and checked. I consider myself security consious and I have already sent my personal email information to a man-in-the-middle attacker. I had almost sent over my work credentials too.

I started looking at netstat. I saw that when I would open the browser it was connecting to a proxy in the staus bar. I took a look at resource monitor and saw a boatload of public internet address that this “Client.exe” was connected to. Netstat showed Client.exe has a port 49181 listener. Chrome is supposed to be connecting to the public internet, not Client.exe.

ip_addresses

The first thing I did was go into “Manage Computer Certificates” and delete the two Fiddler certificates from the root store. This was successful in changing the green chrome lock to a proper red error.

The next thing I did was remove the proxy from lan settings.

proxy

After that I removed “RocketTab” from programs via the control panel. As soon as this was done all the “Client.exe” connections went to TIMER_WAIT status because they were reset. RocketTab was the culprit.

The last thing I did was change all my passwords.

This man-in-the-middle attack on client machines needs to stop. This is a sneaky activity that is not something normal users understand. They generally don’t want the junk applications that these type of ad services support anyway. Users have been socially engineered to install this stuff and it is not clear how to get rid of it or that it is even running in the background. It is a poor business model that needs destroyed.

 
Leave a comment

Posted by on March 7, 2015 in Security

 

Disaster Recovery

I have recently been sucked into all that is Disaster Recovery or Business Continuity Planning. Previously I have been a bit dodgy of the topic. I haven’t really enjoyed the subject because it always seems to distract from my focus on backups and local recovery. I liked to focus on the more likely failure scenarios and make sure those are covered before we get distracted. I’m not really sure if that was a good plan or not.

We would have to loose almost our entire datacenter to trigger our disaster recovery plan. A fire in the datacenter, tornado or maybe loosing our key storage array might trigger DR. Dropping a table in a business application isn’t something you want to trigger a DR plan. Developing a highly available, resilient system is a separate task from developing a DR plan for that system. It was very challenging to convince people to complete a discussion of the local recovery problems without falling into the endless pit of DR.

There seems to be two different business reasons for DR. 1. Complete a test of the plan so we can pass an audit once a year and 2. Create a plan so we can actually recover if there were a disaster. The first one comes with a few key caveats, the test must be non-disruptive to business, it cannot change the data we have copied offsite and it cannot disrupt the replication of the data offsite.

In a cool or warm DR site, the hardware is powered on and ready but it is not actively running any applications. If I were to approach this problem from scratch, I would seriously consider a hot active site. I hear metro clusters are becoming more common. Sites that are close enough for synchronous storage replication enable a quick failover with no data loss. A hot site like this would have many benefits including:
1. Better utilization of hardware
2. Easier Disaster Recovery testing
3. Planned failovers for disaster avoidance or core infrastructure maintenance

However, there are downsides…
1. Increased complexity
2. Increased storage latency and cost
3. Increased risk of disaster affecting both sites because they are closer

Testing is vital. In our current configuration, in order to do a test we have to take snapshots at the cold site and bring those online in an isolated network. This test brings online the systems deemed critical to business an nothing more. In an active/active datacenter configuration, the test could be much more thorough where you actually run production systems at the second site.

A most basic understanding of DR covers the simple fact that we now need hardware in a second location. There is much more to DR than a second set of servers. I hope to learn more about the process in the future.

 
Leave a comment

Posted by on February 7, 2015 in Hardware, Storage, Virtual

 

Reasons you can’t connect to SQL Server

“I can’t connect, can you look at the logs?”

Nope, not today, this is not how we do this. How is the server to LOG if it never receives the request? Do you think the server somehow magically anticipated that you wanted to run a query for the first time? What server are you even talking about???

Connection errors are generally logged on the client side. First read the message carefully and thoroughly for best results. The majority of common errors can be broken down into three categories:

Client issue
Network issue
Server issue

The nature of the word “Connection” means there is some fuzzy areas where two of the three CNS creatures rub uglies. There is a network adapter on the client and a network adapter on the server, and well.. there is a network.

Lets look at one of my more popular reasons you can’t connect to SQL Server, Login Failed.

So which is that, C… N… or S? I can pretty much rule out the network since the client received a message from the server. Maybe it is not even an issue at all, it is a feature I implemented to prevent you from logging into a production server. I really want to put it in the server category, but as I look back on actual cases, it is mostly the fact that access was never requested until it didn’t work. So that is a layer 8 issue with the planning protocol.

Long story short, I really wanted to categorize this list and also provide solutions but it really depends on the situation of the error. Hopefully, this list doesn’t grow much more since I have gotten better at anticipating people who may eventually want to connect to my databases. Without any further complaints, here are the reasons off the top of my head that you can’t connect to SQL Server:

1. You don’t have login access
2. Your login doesn’t have connect privileges
3. The Windows firewall is blocking you
4. The network firewall is blocking you
5. The login doesn’t have public access to the database
6. The server is out of memory
7. The server is actually down
8. The database is in single_user mode
9. The service account is locked out
10. SQL Authentication isn’t enabled on the server
11. You are trying SQL Auth when you should be using Windows Integrated
12. You are typing the password wrong
13. The login is locked out
14. The login is disabled
15. Server cannot generate the SSPI context
16. The service was started with option -m (single user)
17. The vmware host doesn’t have the correct vlan defined
18. The SQL Server’s ip configuration is wrong
19. The network switch doesn’t allow the vlan on that port
20. The distributed switch doesn’t have LACP enabled on that port group
21. The SQL Service is being updated
22. The Windows server is being updated
23. You are not specifying the non-standard port
24. You have the wrong instance name
25. You have the wrong server name
26. You have the wrong port
27. You communicated the wrong port to the network firewall admin
28. You are using the :port syntax instead of the ,port syntax
29. SQL is not set to listen on TCP/IP
30. You ran the C: drive out of space causing a cascading failure
31. You are not connected to the VPN
32. You are connected to the guest wifi
33. You bumped the wifi switch on your laptop

 
Leave a comment

Posted by on January 21, 2015 in SQL Admin, SQL Dev

 

5 9s Lead to Nestfrastructure (and fewer 9s)

Off the top of my head,

Microsoft DNS issue a handful of hours before xbox one launch(http://redmondmag.com/articles/2013/11/21/windows-azure-outages.aspx)

Widespread Amazon outages (http://www.zdnet.com/amazon-web-services-suffers-outage-takes-down-vine-instagram-flipboard-with-it-7000019842/)

NASDAQ (http://www.bloomberg.com/news/2013-08-26/nasdaq-three-hour-halt-highlights-vulnerability-in-market.html)

The POTUS’s baby (http://www.healthcare.gov)

I learned about 5 9’s in a college business class. If a manufacturer wants to be respected as building quality products, they should be able to build 99.999% of them accurately. That concept has translated to IT as some kind of reasonable expectation of uptime. (http://en.wikipedia.org/wiki/High_availability)

I take great pride in my ability to keep servers running. Not only avoiding unplanned downtime, but developing a highly available system so it requires little to no planned downtime. These HA features add additional complexity and can sometimes backfire. Simplicity and more planned downtime is often times the best choice. If 99,999% uptime is the goal, there is no room for flexibility, agility, budgets or sanity. To me, 5 9s is not a reasonable expectation of uptime even if you only count unplanned downtime. I will strive for this perfection, however, I will not stand idly by while this expectation is demanded.

Jaron Lanier, the author and inventor of the concept of virtual reality, warned that digital infrastructure was moving beyond human control. He said: “When you try to achieve great scale with automation and the automation exceeds the boundaries of human oversight, there is going to be failure … It is infuriating because it is driven by unreasonable greed.”
Source: http://www.theguardian.com/technology/2013/aug/23/nasdaq-crash-data

IMHO the problem stems from dishonest salespeople. False hopes are injected into organizations’ leaders. These salespeople are often times internal to the organization. An example is an inexperienced engineer that hasn’t been around for long enough to measure his or her own uptime for a year. They haven’t realized the benefit of keeping track of outages objectively and buy into new technologies that don’t always pan out. That hope bubbles up to upper management and then propagates down to the real engineers in the form of an SLA that no real engineer would actually be able to achieve.

About two weeks later, the priority shifts to the new code release and not uptime. Even though releasing untested code puts availability as risk, the code changes must be released. These ever changing goals are prone to failure.

So where is 5 9s appropriate? With the influx of cloud services, the term infrastructure is being too broadly used. IIS is not infrastructure, it is part of your platform. Power and cooling are infrastructure and those should live by the 5 9s rule. A local network would be a stretch to apply 5 9s to. Storage arrays and storage networks are less of a stretch because the amount of change is limited.

Even when redundancies exist, platform failures are disruptive. A database mirroring failover (connections closed), webserver failure (sessions lost), a compute node (os reboots) and even live migrations of vms require a “stun” which stops the cpu for a period of time(a second?). These details I listed in parentheses are often omitted from the sales pitch. The reaction varies with each application. As the load increases on a system these adverse reactions can increase as well.

If you want to achieve 5 9s for your platform, you have to move the redundancy logic up the stack. Catch errors, wait and retry.

stack

Yes, use the tools you are familiar with lower in the stack. But don’t build yourself a nest at every layer in the stack, understand the big picture and apply pressure as needed. Just like you wouldn’t jump on every possible new shiny security feature, don’t jump on every redundancy feature to avoid nestfrastructure.

 

vMotion, an online operation?

There are two types of vMotions, storage and regular. Storage vMotion moves VM files or a single .vmdk file to another datastore. The regular vMotion moves the VMs memory from one host to another and then stuns the VM in order to pause processing so the new host can open the file and take ownership of the VM. Today I’ll be referring mostly to the regular vMotion.

These are both fantastic technologies that allow for rolling upgrades of all kinds and also the ability to load balance workloads based on usage. The Distributed Resource Scheduler (DRS) runs every 5 minutes by default to do this load balancing. Datastore clusters can be automated to balance VMs across datastores for space and usage reasons. Like I said, these technologies are fantastic but need to be used responsibly.

“VMware vSphere® live migration allows you to move an entire running virtual machine from one physical server to another, without downtime” – http://www.vmware.com/products/vsphere/features/vmotion

That last little bit is up for debate. It depends on what your definition of downtime is. This interesting historical read shows that vMotion was the next logical step after a pause, move and start operation was worked out. Even though VMware is now transferring the state over the network and things are much more live, we still have to pause. The virtual machine memory is copied to a new host, which takes time, then the deltas are copied over repeatedly until a very small amount of changed memory is left and the VM is stunned. This means no CPU cycles are processed while the last tiny little bit of memory is copied over, the file is closed by that host and the file is opened on the new host which allows for the CPU to come back alive. Depending on what else is going on, this can take seconds, yes that is plural. Seconds of an unresponsive virtual machine.

What does that mean? Usually in my environment, a dropped ping, or maybe not even a dropped ping but a couple slow pings in the 300ms range. This is all normally fine because TCP is designed to re-transmit packets that don’t make it through. Connections generally stay connected in my environment. However, I have had a couple strange occurrences in certain applications that have lead to problems and downtime. Downtime during vMotion is rare and inconsistent. Some applications don’t appreciate delays during some operations and throw a temper tantrum when they don’t get their CPU cycles. I am on the side of vMotion and strongly believe these applications need to increase their tolerance levels but I am in a position where I can’t always do that.

The other cause of vMotion problems is usually related to over committed or poorly configured resources. vMotion is a stellar example of super efficient network usage. I’m not sure what magic sauce they have poured into it but the process can fully utilize a 10Gb connection to copy that memory. Because of that, vMotion should definitely be on its own vLan and physical set of NICs. If it is not, the network bandwidth could be too narrow to complete the vMotion process smoothly and that last little bit of memory could take a longer time than normal to copy over causing the stun to take longer. Very active memory can also cause the last delta to take longer.

Hardware vendors advertise their “east-west” traffic to promote efficiencies they have discovered inside blade chassis. There isn’t much reason for a vmotion from one blade to another blade in a chassis to leave the chassis switch. This can help reduce problems with vMotions and reduce the traffic on core switches.

In the vSphere client, vMotions are recorded under the tasks and events. When troubleshooting a network “blip” the completed time of this task is the important part. Never have I seen an issue during the first 99% of a vMotion. If I want to troubleshoot broader issues, I use some T-SQL and touch the database inappropriately. Powershell and PowerCLI should be used in lieu of database calls for several reasons but a query is definitely the most responsive of the bunch. This query will list VMs by their vMotion frequency since August.


SELECT
MAX([VM_NAME]) as 'VM',
count(*) as 'Number of vmotions'
FROM [dbo].[VPXV_EVENTS]
WHERE
EVENT_TYPE = 'vm.event.DrsVmMigratedEvent' and
CREATE_TIME > '2014-8-14'
GROUP BY vm_name
ORDER BY 2

This query can reveal some interesting problems. DRS kicks in every 5 minutes and decides if VMs need to be relocated or not. I have clusters that have DRS on but don’t ever need to vMotion any VMs because of load and I have clusters that are incredibly tight on resources and vMotion VMs all the time. One thing I have noticed is that VMs that end up on the top of this query can sometimes be in a state of disarray. A hung thread or process that is using CPU can cause DRS to search every 5 mintues for a new host for the VM. Given the stun, this isn’t usually a good thing.

IMHO, a responsible VM admin is willing to contact VM owners when they are hitting the top of the vMotions list. “Don’t be a silent DBA.” That is some advice I received earlier on in my career. Maintenance and other DBA type actions that can be “online” but in actuality cause slowdowns in the system that other support teams may never find the cause for. The same advice can be applied to VMware admins as well.

 
Leave a comment

Posted by on September 16, 2014 in Virtual

 

SQL Saturday Columbus Recap #SQLSAT299

I decided to take a brief trip down memory lane for this recap.

http://www.sqlsaturday.com/84/schedule.aspx Attendee, Volunteer
http://www.sqlsaturday.com/160/schedule.aspx Attendee, Volunteer
http://www.sqlsaturday.com/204/schedule.aspx Attendee, Volunteer, Speaker
http://www.sqlsaturday.com/256/schedule.aspx Attendee, Volunteer
http://www.sqlsaturday.com/292/schedule.aspx Attendee, Volunteer Coordinator, Speaker
http://www.sqlsaturday.com/299/schedule.aspx Attendee, Speaker

Some of those session titles are amusing after 3 years, especially anything that has “new” in the title. That first SQL Saturday in 2011 was pretty special. I realized that volunteering helped my more introverted personality get a chance to network with others.

At the Kalamazoo84 SQLSat I was having a conversation about the pains of double hop authentication and another speaker asked me what my session was about, but I was only a volunteer. I didn’t think I was ready to speak(I wasn’t). That person thought for some strange reason I knew my stuff and suggested I whip up a session and try it out. It was some advice that I remember but didn’t act on for quite a while. This was also another interesting question because it is a total bait question. It is something that the speakers are thinking about and is a great icebreaker.

The Detroit SQL Saturday in 2013 was the first time I was a speaker at a SQL Saturday. I had found my niche that I was passionate enough about to actually enjoy getting up in front of people and presenting. The basic SQL topics are great but I didn’t feel I had enough ground breaking experience and depth on any of those topics to present until I found a way to make security interesting. It was my in because nobody else seemed to be talking about it. I saw other presenters doing a bit of cross training into virtualization and storage so I figured a bit of offensive security and networking concepts would be totally acceptable. A couple user groups of practice and I was ready for a larger audience. I packed a smaller room full of very interested and thankful people. I’m glad the first time went well because it was very nerve racking. I may not have continued to challenge myself in this way had it went poorly.

Kalamazoo, Detroit and now Columbus. These SQL Saturday conferences have all been very rewarding. I always learn something, meet at least a few new awesome people and give as much back to the community as I can. Getting a reasonably sized, semi-interested audience is priceless to me when I am trying to practice my presentation and public speaking skills. There is only so much I can teach my wife about computers until she murders me in my sleep!

My session in Columbus went well sans one whoopsie. I have learned I need to get an accurate start and stop time from multiple sources. I started my session at 3:30 thinking the 3:34 was a typo in the handout. Unfortunately it was a typo but in the other direction and was supposed to start at 3:45 according to the website. I started at 3:30 and someone kindly got up and shut the door. A little less than 10 minutes in I noticed a small crowd peeking in the small glass part of the door and someone finally opened it. This nearly doubled the people in attendance so I started over but didn’t show the video ( https://www.youtube.com/watch?v=c36UNSoJenI ) again. Anyways, the slides and demo scripts are posted on the schedule link above.

I decided to attend sessions at this SQL Saturday. Below are the sessions I attended. I particularly liked Kevin Boles SQL Injection session because of the hands on approach. He developed a great demo that showed several different methods of attack and defense. It is also very complimentary to my session because I avoid that particular topic for the most part.

299_attendance

Also, I would like to thank Mark https://twitter.com/m60freeman for organizing a great speaker dinner and event. I’m happy they were able to give me the opportunity to present.

I sometimes imagine where would I be today had I not started attending user groups and events like SQL Saturday. I would most likely be a mess. I have supported an environment that has grown from ~15 SQL servers 5 years ago to almost 200. Without the skills and drive to make SQL Server the best possible platform at my organization I’m not sure I would have as much responsibility. Business users would have run away instead of diving into SQL Server. I imagine myself still being a “DBA” but constantly putting out fires instead of scripting our build and auditing processes. I imagine myself never having the time to research storage and virtualization and becoming confident enough to take on these new administration challenges. I definitely would not have begun the journey of improving my public speaking skills that have improved my overall quality of life. The place without PASS in my life is a scary place.

 
Leave a comment

Posted by on June 24, 2014 in PASS

 

Tags:

 
Follow

Get every new post delivered to your Inbox.

Join 160 other followers