The statistics I am writing about are the ones that belong to indexes and columns. They fell under my feature radar for quite some time. My previous post I wrote about improving a 10 minute query to just over 1 second simply by updating statistics. Just today I had someone let me know they improved a 1 hour+ query down to a 1 minute query simply by updating statistics. This procedure is transparent to the application, it won’t cause any functional problems like a schema change.
“Wait stats” are popular so when you hit your search engine of choice you will get a mix of “wait stats” which are NOT column or index stats. So it’s a real PITA to find solid information on column and index statistics. Notice I keep repeating column AND index stats. This is because sp_helpstats only shows the auto generated column stats and I think ALL statistics are important.
These little buggers are wildly important to the optimizer. The optimizer is the special sauce inside SQL Server that polishes your turd queries so the data can slide smoothly back into your crapplication. So what does the optimizer thinks about this?
That’s right, it doesn’t like it. Some of my servers take on a grumpy personality and it’s usually the ones with bad stats. Bad stats will cause the optimizer to pick poor plans to retrieve your data. For example, choosing the wrong join operator to combine two tables it thinks has 1 row but really one has 1 zillion rows. Or simply choosing the wrong order to join tables.
The optimizer is all knowing and assumes you don’t know how to pick the proper join order. It will re-write your query based on what it thinks is most efficient. And if it doesn’t have good numbers to make those decisions, you are in trouble. It will pick the wrong indexes, the wrong order and all sorts of badness if it doesn’t have good stats.
How do I fix? (In order of expertise)
If you don’t do anything else with stats, leave these settings alone.
execute sp_updatestats is the easy way out. Run that in the context of your database and SQL will grind away until it has updated everything it needs to.
Run the update statistics command on your tables and views with FULLSCAN. FULLSCAN is the sample rate of your data that sql will base its stats off of. You can decrease the sample rate for zillion row tables to decrease the amount of time the command takes to run.
You can use the database tuning adviser to recommend advanced statistics such as multi-column stats and filtered stats that will improve the queries you fed into DTA.
You can identify overlapping stats using the DMVs so the optimizer doesn’t pick stale auto-created column stats over the good index stats.
You can run this query to identify stats based on when they were last updated. This took me quite a bit of digging around to find out how to identify last update. You can get most of the information you need without having to run DBCC SHOW_STATISTICS. sys.stats and the handy system function STATS_DATE hold enough info to get you by. Without knowing that function I started down the wrong path of looking at sys.objects and trying to figure out more information than I really needed at this point in time. sys.objects has modified date which isn’t actually the last time stats were updated but the last time the schema for that object was updated. This query will help you identify, without a query plan, if you have stale stats.
USE [msdb] -- because everyone has one of those GO SELECT object_name(object_id), name, STATS_DATE(object_id, stats_id) AS LastStatsUpdate FROM sys.stats ORDER BY LastStatsUpdate
Why does it break?
Out of site, out of mind. You automatically get a statistic when you create an index. Also, the optimizer will create single column stats on the fly (by default) as queries are run. Who knew you had to take care of these things?
Lots of data is changing and that is fairly normal to SQL Server. Auto-updating statistics using it’s built in algorithm doesn’t always catch stale stats. Using a higher sample rate can get your precious optimizer better numbers and then it will give you better query plans.
The mother of all posts I have found so far. Must read multiple times.
As far as stats queries go, this is certified #awesomesause:
I haven’t run this yet but this definitely has great potential and helped me identify some ways of looking at statistics:
Some more info:
and three….. deeper and deeper….
With the correct parameters and schedule, you won’t ever have stats problems again:
trace flag for better statistics