RSS

Category Archives: SQL Dev

The many different approaches to performance tuning SQL Server

Ever since we started the GLASS user group this spring, I’ve had the idea that we would have a lightning talk style meeting. This is where we have several shorter presentations instead of one long one. My goal was to get newer speakers a chance to dip their toes in the water and help build on a full session that they could present later.

Everyone has a different approach to tuning SQL Server. Different is good, at least on this topic. There can be a lot of friction when trying to troubleshoot where the slowness is happening especially when an organization has a lot of silos. If the tier 1 support has to talk to tier 2 support who has to talk to a developer who has to talk to a server admin who has to talk to a dba who has to talk to a storage admin who… you get the point. I want to get as many perspectives of real world solutions to performance problems together in the same room. Some may think of it as a WWE style smackdown but I think the collaboration would be insanely beneficial.

I couldn’t have been more right :]

We had Kyle talk about implicit conversions specific to SSRS, Mike talk about partitioning, Dave talked about the optimizer, Tom talked about the speed of a single DECLARE or multiple DECLARE statements and I wrapped it up with performance triage with metrics, queries and real world fixes.

The performance tuning process is just that, a process, not a single answer to a problem. There are several ways to approach slowness of an application, and it depends on the situation of how you proceed. Dive right into active queries? Look at the VM CPU graph? Fire back with a bunch of questions? I’ve personally taken all of these angles and found some successes, and a bunch of failures along the way.

 
Leave a comment

Posted by on September 20, 2015 in PASS, SQL Admin, SQL Dev

 

T-SQL Tuesday #065 – Slowly Changing Dimensions

tsql2sday150x150_thumb_2aa4ea0f I’ve been focusing a lot of my study time on data warehousing lately. I’ve been supporting the system and storage of data warehouses for a while but lately have been digging into the developer topics.

What I learned over the weekend is how to build a working, slowly changing dimension in SSDT. Thanks for the challenge #tsql2sday and @SQLMD!

 

The Problem

Dimensions are the tables we design to make data look good in a pivot chart. They are the tables that describe our facts. Customer is a good example of something that could be a dimension table. For my challenge I decided to use virtual machine as my dimension.

The problem is, what if a VM’s attributes change? 4 cores, that was yesterday.. today PRDSQLX has 24 cores. What if someone deletes a VM, how many cores did it have?

I can get the current status of my VMs by using the source system, but the problem is the history. I can pull a snapshot of what VMs I have in my environment every day from the source system. I could just make copies of that data and slap a “PollDate” column on the table. Viola, I have everything I need, and about 1000x more than I need.

There is the problem, how do I collect and save a history of my VM’s attributes?

Solution

Each column in my VM table can be of 3 basic types http://en.wikipedia.org/wiki/Dimension_table

Type 1. Simply overwrite this value… it changes a lot and I don’t care about history (eg. what host is the VM running on)
Type 2. add a new row to maintain history… if one column in my VM row changes, I get a whole new record in my dimension
Type 3. add a new column to keep a limited amount of history… add some columns like previous_num_cpus and previous_previous_num_cpus and move data to that as it changes

So we have to take the data we get on a nightly snapshot of the source, and compare it to what we have in the destination, then do a conditional split. I’m sticking to handling these differences:

New VM – insert with NULL validto (easy)
Deleted VM – change validto column (create staging table and do an except query)
Change in Type 1 Col – update existing VM row with NULL validto column, (easy)
Change in Type 2 Col – insert new row with NULL validto column, change previous record’s validto date (a little tricky)

That logical split can be made easier by using the Slowly Changing Dimension task in SSDT. It pops up a wizard to help you along the way and completely set you up for several failures which I am going to let you learn on your own :]

Step 1. Setup an initial loading package.

This will make it handy to restart your development.

Query the source in a data flow OLE DB Source
Tack on a few extra columns, validfrom, validto, isdeleted, sourcesystemid in the SQL command
create the destination table using the new button ( this is pretty handy to avoid manually lining up all datatypes )
use the new button again to create a dimVM_staging table for later
Add the task at the beginning of the control flow to truncate destination or dimVM table
Run the package and be careful not to accidentally run it since it has a truncate

Step 2. Create this monstrosity

Control Flow
dimVM_scd_control_flow

Data Flow
dimVM_scd_data_flow1

It is actually not too terribly bad. When you add the Slowly Changing Dimension a wizard pops up and when all the stars align, all the data flow transformations and destination below are created.

If we focus on the top of the data flow first, it is easy to see I am pulling from two source systems and doing a union all. The interesting problem I had to solve was the deleted VM problem. The wizard didn’t do that for me. I knew if I had the staging table, I could compare that to the dimVM to see if anything was missing. If you want to find out what is missing, use an EXCEPT query. Once you find out what is missing (deleted VMs) we can update the validto field effectively closing up shop on that row but keeping the history of rows relating to that VM. I decided to add the isdeleted column to make it easier to find deleted VMs. This code is in the SQL Script task on the control flow.

update dimVM
set dimVM.validto = getdate(), dimVM.isdeleted = 1
from dimVM
inner join (
select vmid,vcenter from dimVM
where validto is null
except
select vmid,vcenter from dimVM_staging
) del
on dimVM.vmid = del.vmid and dimVM.vcenter = del.vcenter

One last little tidbit. If you make any modifications to the transformations that the SCD wizard created, you should document them with an annotation. If for some reason you have to get back into the wizard, it will recreate those transformations from scratch… ironically not maintaining any history.

Step 3. Profit

I hope you enjoyed hearing about my new experiences in the Slowly Changing Dimension transformation in SSDT.

 
1 Comment

Posted by on April 14, 2015 in SQL Dev

 

Final Preparation for 70-463 Implementing a Data Warehouse with Microsoft SQL Server 2012

This is a continuation of this post

Two fellow bloggers have been posting more quality information on this test.
http://colleenmorrow.com/

http://www.desertislesql.com/wordpress1/?p=243

When reading the book I skipped over all of the practice sections. I did read the exam tip sections inside of the practice but never actually practiced. I don’t have a lot of hands on experience with SSIS and even less with mds/dqs. I spent about 9 weeks making through the book while skipping the practice and most of the reviews. I probably would have needed an additional 18 weeks to properly make it through all of the practice or lab type sections of the book. Learn one, do one, teach one is my favorite method to mastery but with 2nd shot deadline, I didn’t have a lot of time to prepare.

To supplement, I attempted to find videos on youtube and watched videos on the Microsoft Virtual academy. Both sources were not very demo heavy. What I did find is CBT nuggets that give a 7 day trial. The 70-461 videos that I was able to watch were very high quality, fast paced and demo heavy. This is exactly what I needed at this time. I’d recommend a membership if you have a bundle of money burning in your pocket.

Since my trial was up I decided to type up my CBT nugget notes.

CBT connections managers
control flow -> doesn’t involve data
bottom level are private connection managers, a.k.a package level
right solution explorer is project level connection managers which are global
you can enable/disable sequence containers
precedence constraints, go to properties to define AND or OR logic
copy-> paste package connection managers
delay validation -> doesn’t check structure
email doesn’t have a port option but could purchase add-ins or write your own
fix for NULLs is COALESCE

Data Flow
rows, buffers, pipeline,transformations
raw file -> ssis only -> good for sharing data between packages
raw file -> good for resuming packages
recordset->variable used to loop through
for performance, aggregate at the source since that is blocking
import export col -> for blob data
term matching is like CTRL+F
blocking tasks take lots of memory -> sort, aggregate
partial-blocking -> merge chuncks

Data Quality services
cleansing matching
server is 3 databases
dqs client is used for creating KBs
creating a knowledge base
-open xls sheet -> job title list for KB example
-KB needs a domain, circle with * button is domain
State length of 2 is an example domain rule
composite domain (EX: address which includes city state zip)
reference data source RDS (ex: mellisa data for addresses)
KB’s get published
activity is automatically logged

Implementing DQS
data profiling task in SSDT
-profile types
–null ratio request
–pattern generator RegEx for formatting
–column statistics
-then specify column
Quick profile: runs against all columns
Open data profile viewer
Cleansing
suggested confidence level
corrected confidence level
DQS cleansing task
Job title source job_title _output
jobtitles table
matching
newKB->domain->source column (survivor record)
the table with the + button to add a rule and use the Rule Editor

Implementing MDS
proactive management
people place concepts or things
non-transaction data is good for MDS
includes auditing and versioning
MDS Componenents(Database, config mgr, MD mgr, web service, mds model deploy, excel Add-In)
MDS Objects(Models: the container db, Entities: like tables, Attributes: like columns, Hierarchies, Members: Actual data)
Install requires powershell 2.0 and IIS 7.5, silverlight and a database
has integration with DQS
to deploy packages that contain data must use CLI (deploynew -package “” -model)

Data flow
merge join requires sort -> advanced editor, pick isSorted and the column
MetaData problems: double click on flow and change types
Lookup transformation
-cache connmgrs for re-use
–redirect rows
–multi output popup
slowly changing dimension task (wizard)
fixed attribute fail on change
changing attribute type 1 overwrite type 2 new records (history)
inferred member flag goes in dimension
blocking oledb command
redirect error rows to flat file

executing packages
“$Project::ProjectParam(String)”; test@email.com
LOGGING_LEVEL 3 = Verbose
dtexec.exe is fire and forget style
xp_cmdshell
built-in SPs in ssisdb
catalog.set_obj_param value
restartable packages
-checkoint file
-tracking last successful step in control flow
project properties
-select file name
-set usage never
–if exist
-save checkpoints = true
-set property fail package on failure = true
to test, can set task property to force a failure

 
Leave a comment

Posted by on April 9, 2015 in SQL Admin, SQL Dev

 

More Preparation for 70-463 Implementing a Data Warehouse with Microsoft SQL Server 2012

This is a continuation of my previous post. This is just some very quick notes that I am posting for my benefit and so that readers may get an idea of the preparation necessary for this test. They are my notes from this book: https://www.microsoftpressstore.com/store/training-kit-exam-70-463-implementing-a-data-warehouse-9780735666092

PART II: Developing SSIS Packages

simple data movement – can use import export wizard
complex data movement – SSDT
SSDT is visual studio shell used to develop IS,AS,RS projects
Control Flow connection managers can be package or project scoped
Connection manager types:
ADO – backwards compatibility
ADO.net – compatible with sql server
AS – analysis services
File – SSIS data type
Flat file – delimited file
ftp – security option is only basic auth
http – web services or file, no windows auth
OLE DB – sql server, will be removed in favor of ODBC
ODBC – open database connection
SMTP – basic email auth only

package scoped connection managers will override the higher level project scoped connmgrs

control flow tasks and containers
containers help control execution of tasks
transformations include
cleansing – remove invalid data or unwanted data
normalization – XML value to varchar
conversion – byte[] to varbinary(max)
translation – “F” to “Female”
validation
data calculation and data aggregation
data pivoting and data unpivoting

ssis tasks categories, data prep, workflow, data movement, SQL admin, SQL maintenance

containers, for loop, foreach loop, sequence

Precedence Contstraints ( the arrows that come off of tasks)

success, failure, completion
dotted lines mean OR and solid means AND logic used when multiple tasks are involved in flow

Designing and Implementing Data Flow

Data Flow is a level deeper than the control flow
Control flow triggers data flow
data flow task builds execution plan from data flow definition
data flow engine executes the plan
*Validate external metadata – checks for existance of tables and objects and should be turned off if they are dynamically created
builk OLEDB = fast load
ODBC = batch
fast parse is available at the column level on some data types ( date, time, int )
Working with data flow transformations
-Blocking (ex: sort, aggregate) transformations that read all data in before passing any rows down the pipeline
-Non-Blocking -> lookup, multicast, conditional split or other row-by-row transformations
-partial-blocking -> merge, merge join, union all, data flows in chunks
cache transformations – good for multiple transformations on same data
import/export col – good for blobs
character map – upper case, lower, linguistic bit operations
advanced data prep: dqs cleansing, oledb command, slowly changing dimension, fuzzy grouping, fuzzy lookup, script component(custom.net)
#NEW# Resolve references editor helps resolve mapping problems
Lesson 3: strategy and tools
lookup transformation caching
how to handle rows w/ no matches
sort is expensive, optionally perform sorts at source and use advanced editor to mark data as sorted
avoid update and delete on fact tables
do large table joins on database layer
do updates on loading or temp tables in set based sql operations
Chapter 6: Enhancing Control Flow
ssis variables and parameters
avoid retrieving external source variables more than once
parameters are exposed to the caller bu variables are not
parameters are read-only and can only be set by the caller
variables are helpful to reuseability
variables are user defined or system
variables can store rows foreach enum containers
-avoid storing large rowsets in memory/variables
variable data types
-object: last resort
Int16: -32,768 thru 32,768
UInt16: 0 thru 65,535
UInt32: 0 thru 4,294,967,295
DateTime
String
Char: 65,353 unicode
Decimal: 28 or 29 significant digits
Single
Double
Variable Scope
-Package Scopre
—-Container Scoped
——–task scoped
property parameterization
explicit assignment
expressions
lesson 2: connmgr, tasks, and precedence constraint expressions
expression: combination of constants, variables, parameters, column refs, functions, and expression operators
-special ssis syntax close to C++
math functions: ABS, EXP, CEILING, etc…
String Functions: FINDSTRING, HEX, LEN, LEFT, REPLACE
precedence constraints can use AND/OR logic expressions
Lesson 3: Master Pakcage
just a normal package that uses the execute package task
use variables to expose results to parent
use project deployment model to make parameters available to child packages
use project scoped parameters
CHAP7: Enhancing Data Flow
Lesson 1: Slowly Changing Dimesions
-late arriving dims or early arriving facts
–1. insert row into dim, mark inferred… requires bit col
–2. use newly created surrogate key
–3. when loading dim overwrite inferred members
TYPE 1 SCD: overwrite
TYPE 2 SCD: keep all history
can use conditional split to see what columns changed
ex: source.fullname dest.fullname
using t-sql hashbytes can compare for changes
–then two cols for hash val Type1 & type2
use set based updates instead of wizard
Lesson 2: preparing a package for incremental load
dynamic sql
change data capture
Dynamic SQL in OLEDB source
1. select dataaccess mode of sql command and use ? to pass parameter
2. pass variable to sql command and use expressions to modify the sql string
cdc functionality – cdc source and cdc splitter
-ALL, ALL w/old, net, netw/update mask, net w/merge
lesson3: error flows
route bad rows – fail, ignore (copies null), redirect rows
chapter 8: creating robust and restartable packages
can set transactions at package control flow or task level
transactions use msdtc
transaction options are: required, supported, not supported
transactions work on control flow not data flow
can nest a not supported execsql that won’t rollback inside a transaction (ex: still want to audit on fail)
lesson2: checkpoints
save checkpoints need turned on, on package
creates a file and restarts if exists
starts from begining if not exists
lesson3: event handlers
onerror
onwarning
onvariablechanged
ontaskfailed
can turn event handlers off for task
chapter 9: implementing dynamic packages
project level and package level connection mgrs and paramters
must be deployed to ssis catalog
parameter design values are stored in the project file
cannot change parameter value while package is running
property expressions are evaluated on access
lesson2: package configs
enable package deployment model
can get parent package configs
chapter10: auditing and logging
logging: package configuration
auditing: dataflow trnasformation component
lesson1: logging packages
providers are: txt file, sql profileer, sql server, event log, xml
boundry progress exception
use parent setting is default
ssis control flows can be configured for logging
lesson2: auditing and lineage
elementary auditing – captures changes
complete – adds usage or read activity
audit transformation editor
lesson3: preparing package templates
keep packages in source control

Part IV: managing and maintaing ssis packages
ssis service is required in production
ssisdb new
package install utility is legacy
can use ssdt or ssms to deploy packages
project model or package model
dtexecui is legacy
can use TSQL, powershell, manual dtexec cli to execute packages
agent to schedule packages
introduced master package concept
securing packages: uses sql security concepts of principals and securables
ssis_admin role
ssis_user by default allowed to deploy, and deployer is allowed to read, modify, execute
Chapter 13: troubleshooting and perf tuning
breakpoints work only in control flow
breakpoints and fire on a hit count
data viewers on path will show grid view of data
use error outputs to catch bad rows
test with a subset of data
basic logging is default
switch to verbose when there are problems
data taps are like dataviewers for production
must be predefined using catalog.add_data_tap for specific data flow
lesson2: perf tuning
buffers are a group of data in data flow
determined automatically
Transformation Types
-non-blocking: row based synchronous
-partial blocking: asynchronous transformation
-blocking: asynchronous
backpressure controls flow for best memory control
max buffer rows – 10,000 default
max buffer size – 10MB by default
fast load on destination
full-cache lookups
avoid oledb transformations
BLOBs get swapped to disk
data flow engine threds
max concurrent executables -1 = # of logical processors +2
perfmon counter: buffers spooled

PART V: Building Data Quality Solutions

chapter14: installing and maintaining DQS
Soft dimensions: timeliness, ease of use, intension, trust, presentation quality
hard dimensions: accuracy, consistancy
Schema dimensions: completeness, correctness, documentation, compliance w/theoretical models, minimalization
activites: understand sources and destinations
lifecycle
security and backups managed through ssms
Chapter15: implementing MDS
metadata, transactional, hierachical, semi-structured, unstructured, master
MDM goals: unifying or harmonizing, maximize ROI through reuse, support compliance, improving quality
MDM: coordinated set of tools policies to maintain accurate master data
map master data dimensions to DW
Installing MDS: DB, Service(Needs IIS), Manager, Excel Add-IN
Creating MDS model
1.Model
2.Entities(like tables)
3.Attributes(like columns)
4.Hierarchies
5.collections
Derived hierarchies: Recursive with TOP = NULL (ex: Org Chart)
Explicit Hierarchies – Organization can go any way
Collection: flat list of members
MDS service performs business logic
Chapter16: managing master data
MDS Packages
-Model deployment package to move data to another server
-wizard only includes meta data
-permissions are not included
-MDSModelDeploy command prompt if you want to move data
exporting – tsql on subscription views, webservice
Security, system admin (one user, tsql to change), model admin (complete model access)
entity permissions apply to all attributes
mds add-in for excel (connect to to http://server:8080)
when model and member permissions are overlapping read-only > updated and deny > *
excel add-in can use DQS KB matching
Chapter17: creating a data quality project to clean data
knowledge disovery
domain managment
reference data services
matching policy
domain: semantic representation of column
properties: data type, leading values, normalize, format, spellchecking
Term basic relation: Inc. -> Incorporated

I skipped 18,19,20: Advanced ssis and data quality topics because only 5 parts are listed on the exam prep and I ran low on time.

 
1 Comment

Posted by on April 8, 2015 in SQL Admin, SQL Dev

 

Reasons you can’t connect to SQL Server

“I can’t connect, can you look at the logs?”

Nope, not today, this is not how we do this. How is the server to LOG if it never receives the request? Do you think the server somehow magically anticipated that you wanted to run a query for the first time? What server are you even talking about???

Connection errors are generally logged on the client side. First read the message carefully and thoroughly for best results. The majority of common errors can be broken down into three categories:

Client issue
Network issue
Server issue

The nature of the word “Connection” means there is some fuzzy areas where two of the three CNS creatures rub uglies. There is a network adapter on the client and a network adapter on the server, and well.. there is a network.

Lets look at one of my more popular reasons you can’t connect to SQL Server, Login Failed.

So which is that, C… N… or S? I can pretty much rule out the network since the client received a message from the server. Maybe it is not even an issue at all, it is a feature I implemented to prevent you from logging into a production server. I really want to put it in the server category, but as I look back on actual cases, it is mostly the fact that access was never requested until it didn’t work. So that is a layer 8 issue with the planning protocol.

Long story short, I really wanted to categorize this list and also provide solutions but it really depends on the situation of the error. Hopefully, this list doesn’t grow much more since I have gotten better at anticipating people who may eventually want to connect to my databases. Without any further complaints, here are the reasons off the top of my head that you can’t connect to SQL Server:

1. You don’t have login access
2. Your login doesn’t have connect privileges
3. The Windows firewall is blocking you
4. The network firewall is blocking you
5. The login doesn’t have public access to the database
6. The server is out of memory
7. The server is actually down
8. The database is in single_user mode
9. The service account is locked out
10. SQL Authentication isn’t enabled on the server
11. You are trying SQL Auth when you should be using Windows Integrated
12. You are typing the password wrong
13. The login is locked out
14. The login is disabled
15. Server cannot generate the SSPI context
16. The service was started with option -m (single user)
17. The vmware host doesn’t have the correct vlan defined
18. The SQL Server’s ip configuration is wrong
19. The network switch doesn’t allow the vlan on that port
20. The distributed switch doesn’t have LACP enabled on that port group
21. The SQL Service is being updated
22. The Windows server is being updated
23. You are not specifying the non-standard port
24. You have the wrong instance name
25. You have the wrong server name
26. You have the wrong port
27. You communicated the wrong port to the network firewall admin
28. You are using the :port syntax instead of the ,port syntax
29. SQL is not set to listen on TCP/IP
30. You ran the C: drive out of space causing a cascading failure
31. You are not connected to the VPN
32. You are connected to the guest wifi
33. You bumped the wifi switch on your laptop

 
Leave a comment

Posted by on January 21, 2015 in SQL Admin, SQL Dev

 

don’t grant db_owner

Don’t grant sysadmin, I hope that has been made clear in previous posts. If you grant sysadmin you are making a HUGE mistake. Sysadmins can take over the server. Local admins can take over SQL. It does take some effort but Local Administrators and SQL Sysadmins effectively have the same privileges. When it comes to audit time, I believe you might be able to state these two roles as separate but behind the scenes they are the same.

But what about this db_owner? It looks fairly harmless. Lets take a closer look.

How bad is it?

As you can see, we have this new guy that wants to connect to SQL Server. We create him a login on an existing server with topsecret information (ooops #1). We create him a SQL Login jimmy_the_dev (ooops #2). But those two things alone are not all that bad. He can’t do much with just the public role.

Ahhh, I see, you wanted some actual privileges

We do have to grant jimmy some permissions. To do this we should create mappings from the login to a database. If you use the GUI it creates a database user automatically.

If you just select public, SQL creates the database user, but jimmy doesn’t have any useful permissions. He can’t even see the nice new ‘Table_1’ we created for him. He is past the first error but not by much.

Where are my tables?

There are some convenient default database roles. db_datareader is a nice one that will grant read access to everything, including future objects. db_datawriter is the same but for IUDs. Some users will need to be manually granted execute permissions in the database so they can run stored procedures.

My recommendation is to create a few custom database roles and map to those. Unfortunately, time doesn’t always allow for that. New additions to the schema will need to be added to these roles. Maybe now you have time for this manual work now but when the schema changes you might not have that luxury.

The easy button for granting all access to a single database is the role “db_owner” The easy button is sometimes the wrong button to press. Here’s why:

1. a db_owner can create tables (1 trillion row heaps)
2. a db_owner can drop the database (dammit jimmy!)
3. a db_owner can backup the database and break the dba’s differential backups.
4. a db_owner can create users mapped to existing logins, and make them db_owners too!
5. a db_owner can perform maintenance like rebuilding 1 trillion row heaps
6. a db_owner can modify files and grow them to ridiculous sizes
7. a db_owner can add log files on the same drive as data files
8. a db_owner can turn off auto-growth

And maybe the best one of all, a db_owner can drop users… including themselves!

The suicide squeeze.

That said, these permissions are not too devastating. db_owner is not a “take over the world” role. I use it from time to time but the more experienced I get, the less I want to use it. The main reason is if you grant this, no one runs into security problems and they don’t understand how database security works. When it comes time to go live you might have to go through all of the security exercises you skipped by granting db_owner. That is more painful that doing it right in the first place.

 
Leave a comment

Posted by on October 24, 2012 in Security, SQL Admin, SQL Dev

 

Tags:

What a statistics problem can look like

You are a busy production OLTP SQL server and your owner has decided that its only necessary back you up and re-org your heavily fragmented indexes. That’s it, he is a minimalist and his ideological ways work well for months…

ZZOMG!!

Then along comes Martha and her HR report. The plan cache is bloated with these pesky OLTP sql plans and Martha’s report needs more buffer pool and a big fat sql plan. (Martha’s not fat just her plan) So her plan and buffer pool requirements evict your very important OLTP plans. Re-evaluating plans once in a while is fine, part of normal memory management is to clean up old unused stuff. By “clean up stuff” I mean use the special sauce algorithms inside the sql server optimizer to decide what needs to go. This is much more complicated than a LIFO or FIFO system.

The data in these tables have changed so much you are not sure what to do. Optimization is taking forever so you throw a new sql plan in place and go about your business.

This optimization process didn’t have enough current data(a.k.a. statistics) so you got a crappy plan for a query that is run over, and over, and over again. The query that used to bring back data with 24 logical reads now needs 24,000 logical reads. But guess what, your optimizer is so confident in it’s nerdy mathematical algorithms that it doesn’t bother to re-compile. So what do you do? Start your CPU insanity workout until your owner fixes your statistics which forces a re-compile of the crappy plan.

 
1 Comment

Posted by on August 22, 2012 in SQL Admin, SQL Dev

 

SQL Server Statistics

The statistics I am writing about are the ones that belong to indexes and columns. They fell under my feature radar for quite some time. My previous post I wrote about improving a 10 minute query to just over 1 second simply by updating statistics. Just today I had someone let me know they improved a 1 hour+ query down to a 1 minute query simply by updating statistics. This procedure is transparent to the application, it won’t cause any functional problems like a schema change.

“Wait stats” are popular so when you hit your search engine of choice you will get a mix of “wait stats” which are NOT column or index stats. So it’s a real PITA to find solid information on column and index statistics. Notice I keep repeating column AND index stats. This is because sp_helpstats only shows the auto generated column stats and I think ALL statistics are important.

These little buggers are wildly important to the optimizer. The optimizer is the special sauce inside SQL Server that polishes your turd queries so the data can slide smoothly back into your crapplication. So what does the optimizer thinks about this?

Blasphemies!

That’s right, it doesn’t like it. Some of my servers take on a grumpy personality and it’s usually the ones with bad stats. Bad stats will cause the optimizer to pick poor plans to retrieve your data. For example, choosing the wrong join operator to combine two tables it thinks has 1 row but really one has 1 zillion rows. Or simply choosing the wrong order to join tables.

The optimizer is all knowing and assumes you don’t know how to pick the proper join order. It will re-write your query based on what it thinks is most efficient. And if it doesn’t have good numbers to make those decisions, you are in trouble. It will pick the wrong indexes, the wrong order and all sorts of badness if it doesn’t have good stats.

How do I fix? (In order of expertise)

If you don’t do anything else with stats, leave these settings alone.

execute sp_updatestats is the easy way out. Run that in the context of your database and SQL will grind away until it has updated everything it needs to.

Run the update statistics command on your tables and views with FULLSCAN. FULLSCAN is the sample rate of your data that sql will base its stats off of. You can decrease the sample rate for zillion row tables to decrease the amount of time the command takes to run.

You can use the database tuning adviser to recommend advanced statistics such as multi-column stats and filtered stats that will improve the queries you fed into DTA.

You can identify overlapping stats using the DMVs so the optimizer doesn’t pick stale auto-created column stats over the good index stats.

You can run this query to identify stats based on when they were last updated. This took me quite a bit of digging around to find out how to identify last update. You can get most of the information you need without having to run DBCC SHOW_STATISTICS. sys.stats and the handy system function STATS_DATE hold enough info to get you by. Without knowing that function I started down the wrong path of looking at sys.objects and trying to figure out more information than I really needed at this point in time. sys.objects has modified date which isn’t actually the last time stats were updated but the last time the schema for that object was updated. This query will help you identify, without a query plan, if you have stale stats.

USE [msdb] -- because everyone has one of those
GO
SELECT 
  object_name(object_id), 
  name, 
  STATS_DATE(object_id, stats_id) AS LastStatsUpdate
FROM sys.stats 
ORDER BY LastStatsUpdate

Why does it break?

Out of site, out of mind. You automatically get a statistic when you create an index. Also, the optimizer will create single column stats on the fly (by default) as queries are run. Who knew you had to take care of these things?

Lots of data is changing and that is fairly normal to SQL Server. Auto-updating statistics using it’s built in algorithm doesn’t always catch stale stats. Using a higher sample rate can get your precious optimizer better numbers and then it will give you better query plans.

Resources

The mother of all posts I have found so far. Must read multiple times.
http://www.simple-talk.com/sql/performance/sql-server-statistics-problems-and-solutions/

As far as stats queries go, this is certified #awesomesause:
http://www.kendalvandyke.com/2010/09/tuning-tip-identifying-overlapping.html

I haven’t run this yet but this definitely has great potential and helped me identify some ways of looking at statistics:
http://www.daveturpin.com/2011/03/maintaining-accurate-sql-server-statistics-in-small-maintenance-windows/

Some more info:
http://sqlserverpedia.com/wiki/Index_Statistics

and three….. deeper and deeper….
http://msdn.microsoft.com/en-us/library/cc966419.aspx

With the correct parameters and schedule, you won’t ever have stats problems again:
http://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html

trace flag for better statistics
http://itknowledgeexchange.techtarget.com/sql-server/new-sql-2008-r2-sp1-trace-flag-adjusts-autostats-threshold/

 
1 Comment

Posted by on July 24, 2012 in SQL Admin, SQL Dev

 

It sounds easy, graphing http errors over time

I have access to several fancy dancy monitoring tools. They alert and report quite well but I had a unique case where I needed to adjust a website monitor. I was getting alerts almost daily that the website was down but I would immediately check it out and it was operating just fine. The server logs showed no trace of the monitor request or any errors to point me in any particular direction.

I approached this issue with the assumption that is was either the inconsistent interwebz or a monitor false alarm. We have monitors that were successful that run via agents locally on the web server. That proves the server was functional, it just wasn’t receiving the traffic.

What I needed was a a tool that could check the site more frequently and allow me to adjust the timeout value. Sure, there are probably a lot of tools already, but why not give myself maximum flexibility and write one myself.

I realize it’s a little sloppy and needs some thread management to handle some timing but it did the trick. It’s a GUI with a few buttons but here is the code behind the form.

Imports System.Net
Imports System.Data.SqlClient
Public Class Form1


    Private Sub btnStart_Click(sender As System.Object, e As System.EventArgs) Handles btnStart.Click
        Timer1.Interval = Integer.Parse(txtInterval.Text)
        Timer1.Enabled = True
        Timer1.Start()
        System.IO.File.WriteAllText("ispchecklog.txt", "")
        btnStart.Enabled = False
    End Sub

    Private Sub btnStop_Click(sender As System.Object, e As System.EventArgs) Handles btnStop.Click
        Timer1.Stop()
        Timer1.Enabled = False
        btnStart.Enabled = True
    End Sub


    Private Sub Timer1_Tick(sender As System.Object, e As System.EventArgs) Handles Timer1.Tick
        CreateCommand("INSERT INTO [ISPCheck].[dbo].[log] ([stamp] ,[rc], ) VALUES ('" & DateTime.Now.ToString("yyyyMMdd hh:mm:ss") & "','" & WRequest(txtURL.Text) & "','" & System.Environment.MachineName & "' );")
    End Sub


    Function WRequest(ByVal URL As String) As Integer
        Dim responseData As String = ""
        Try
            Dim hwrequest As Net.HttpWebRequest = Net.WebRequest.Create(URL)
            hwrequest.Accept = "*/*"
            hwrequest.AllowAutoRedirect = True
            hwrequest.UserAgent = "http_requester/0.1"
            hwrequest.Timeout = Integer.Parse(txtTimeout.Text)
            hwrequest.Method = "GET"

            Dim proxy As IWebProxy = WebRequest.GetSystemWebProxy()
            proxy.Credentials = CredentialCache.DefaultCredentials
            hwrequest.Proxy = proxy


            Dim hwresponse As Net.HttpWebResponse = hwrequest.GetResponse()
            If hwresponse.StatusCode = Net.HttpStatusCode.OK Then
                responseData = 200
            End If
            hwresponse.Close()
        Catch e As Exception
            System.IO.File.AppendAllText("log.txt", DateTime.Now.ToString("yyyyMMdd hhmmss") & e.ToString & System.Environment.NewLine)
            responseData = 0
            MessageBox.Show(e.ToString)
        End Try

        Return responseData
    End Function

    Public Sub CreateCommand(ByVal queryString As String)
        Try
            Using connection As New SqlConnection(txtConnstr.Text)
                Dim command As New SqlCommand(queryString, connection)
                command.Connection.Open()
                command.ExecuteNonQuery()
            End Using
        Catch ex As Exception
            System.IO.File.AppendAllText("log.txt", DateTime.Now.ToString("yyyyMMdd hhmmss") & ex.ToString & System.Environment.NewLine)
            MessageBox.Show(ex.ToString)
        End Try
    End Sub

End Class

What I got back was a lot of alerts in a table with their time and http return code where 0 was a timeout. I had this test running on several computers so the time values were not consistant. I thought they would line up to show some “outages” but I couldn’t tell from just looking at the table. I wanted to see if they were clustered together so I needed to see them on a graph. No matter how I sliced it I couldn’t get it to look good. Then I realized what I needed was a time dimension, and then to count the errors within certain time slices. This query will group the events into 5 minute intervals. Also important, it will give me the intervals that don’t have any alerts so I can easily produce a graph over time.

SELECT
dateadd(minute,(datediff(minute,0,stamp)/5)*5,0) as interval,
MIN(rc) as min_rc,
COUNT(case when rc = 0 then 1 end) as 'number of occurances'
from ispcheck.dbo.log
group by dateadd(minute,(datediff(minute,0,stamp)/5)*5,0)

This link got me close to the query I needed. I found quite a few that didn’t really do the trick at all before I located this forum post.

http://www.techtalkz.com/microsoft-sql-server/170861-grouping-sql-output-5-minute-intervals.html

 
Leave a comment

Posted by on March 18, 2012 in Network Admin, SQL Dev

 

Tags: ,

back to t-sql basics (2 of 2)

So you want to make changes to the database? Well that has all sorts of implications. My experience with database change is only with small applications that I have written myself. The other experience I have is troubleshooting a few ISV applications that require me to delete or update erroneous records.

This lack of experience has driven me. It should be a requirement of anyone administering SQL to know SQL. The DMVs are great way for a sysadmin to get started querying. However, you don’t write to these objects. So the first time you need to preform an update or delete there’s a bit of voodoo involved.

What I have started doing is collecting useful information using some of Allen White’s DIY baseline and monitoring powershell scripts. (http://sqlblog.com/blogs/allen_white/archive/2012/03/03/load-perfmon-log-data-into-sql-server-with-powershell.aspx) This has started giving me at least a cursory knowledge of IUDs by loading this data into other tables that I need to maintain. Also, cleaning up data that is old is a must. But again this small-time development is mostly a refresher for me.

I’ve gotten great experience at the enterprise sysadmin role but I seriously lack the production development DBAs skills. I hope I have a chance to learn about schema changes on busy databases before I have to make any. Also, I would like to be able to offer more assistance for the database design problems. For now, I don’t stay awake at night worrying about a missing where clause on a delete or an improperly nested update query but maybe someday it will.

There are two remaining chapters on concurrency and programming that I might be covering with some notes later. This book has my highest recommendation. Without further ado, I have taken these notes on my reading of: Microsoft SQL Server 2008 T-SQL Fundamentals by Itzik Ben-Gan.

(1, p238)
Insert into values has changed in 2008. You can now include multiple values (1,1,1),(2,2,2),(3,3,3).

Insert into select allows you to insert records from one result set into a table. This is called table value constructor.

Insert exec works like insert select but allows you to insert the results of a sp.
(1, p241)
Select into is non-standard sql. It allows for table creation on the fly.

Bulk insert is a very fast method of loading from a flat file.
(1, p242)

Bulk insert from ‘c:\inputfile.csv’ with
(fieldterminator = ‘,’,
Rowterminator = ‘\n’)

Identity allows for auto incrementing a field in a table. Do not specify this column on the insert and based on the seed and increment the value will increase. IDENTITY(1,1) means start at 1 and increment by 1. This is not the same as a sequence.

SCOPE_IDENTITY() will return the last identity used within the current scope. Identities will increment on failed inserts. Deleted records will create gaps in identities. Ident_current() will ignore session and give you the last identity.

(1, p247)
SQL only has two commands for deleting data, delete and the non-standard truncate. Deleting whole objects such as tables can be done with the DROP command.
Deletes based on a join are non-standard, use deletes with subqueries instead.
UPDATING
2008 allows for += where previous versions don’t allow those compound operators
Sql does all at once operations. Consider col1 = 20 and col2 = 10

Update testable
Set col1 = col2, col2 = col1;

This query would swap col1 and col2 where in most languages the values would end up being the same.
Just like deletes, updates can be based on a join but it is non-standard. Use subqueries

Update testable
set col1 += .05
where exists
(select * from testtable2 as tt2
where tt2.col1 = testable.col1
and id = 1)

(1, p254)
Assignment Update = new to allow creating your own sequences
MERGE; allows for IUDs with logic. Its also new to 2008 and is mostly ansi standard.

(1, p259)
SQL allows the use of CTEs to modify data… whoa…
-can be useful to select data before modifying it
-can use a table expression to select TOP and orderby… then run your update

(1, p266)
OUTPUT

Delete from testable
OUTPUT
deleted.col1,
deleted.col2
where col1 < ‘20010101’

This will return col1 and col2 for all the records that were deleted.

Source
1. Ben-Gan, I. (2009). Microsoft SQL Server 2008 T-SQL Fundamentals. Redmond, Washington: Microsoft Press.

 
Leave a comment

Posted by on March 11, 2012 in SQL Dev