I’ve been working with SQL Server for almost 16 years now and one of the most fascinating aspects of it are Statistics.
Statistics show the distribution of data in a column(s – more later). Knowing the spread of the data in the column will aid SQL Server in finding the right plan.
On each database there is an AUTO UPDATE STATISTICS option. Generally, it’s recommended that you leave this set to ON unless you are in an extreme high transactional throughput scenario or a large data warehouse, whereby you might set it to OFF.
What AUTO UPDATE STATISTICS actually means is SQL Server will update those statistics it considers to be out-of-date.
Well, what exactly defines out-of-date statistics? Let’s have a look. Microsoft describe this in the following technet article…
and also here in this kb article…
It’s specifically for SQL Server 2008 but we have to start somewhere!
It states the following :
“A statistics object is considered out of date in the following cases:
· If the statistics is defined on a regular table, it is out of date if:
o The table size has gone from 0 to >0 rows (test 1).
o The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
o The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).
· For filtered statistics, the colmodctr is first adjusted by the selectivity of the filter before these conditions are tested. For example, for filtered statistics with predicate selecting 50% of the rows, the colmodctr is multiplied by 0.5.
· One limitation of the automatic update logic is that it tracks changes to columns in the statistics, but not changes to columns in the predicate. If there are many changes to the columns used in predicates of filtered statistics, consider using manual updates to keep up with the changes.
· If the statistics object is defined on a temporary table, it is out of date as discussed above, except that there is an additional threshold for recomputation at 6 rows, with a test otherwise identical to test 2 in the previous list.”
So to summarise this…
If the table has 0 rows; any number of rows inserted sets out-of-date statistics
<500 rows; increased by 500 from the leading column of colmodctr, sets out-of-date statistics
>500 rows; increased by 500 + 20% of the number of row, from the leading column of colmodctr, sets out-of-date statistics
Microsoft have summarised this in a table….
_________________________________________________________________________________ Table Type | Empty Condition | Threshold When Empty |Threshold When Not Empty _________________________________________________________________________________ Permanent | < 500 rows | # of Changes >= 500 | # of Changes >= 500 + (20% of Cardinality) ___________________________________________________________________________ Temporary | < 6 rows | # of Changes >= 6 | # of Changes >= 500 + (20% of Cardinality) ___________________________________________________________________________ Table Variables | Change in cardinality does not affect AutoStats generation. ___________________________________________________________________________
Filtered Statistics has their own logic, which we’ll look at later. To be honest, I’m not too fussed about statistics on tables with less than 500 rows. It’s the statistics on tables greater than 500 rows, that I’m super interested in.
So what this means is; if you have a table with 1,000,000 rows;
and then 500 + 200,000 are added to it, the very next time it gets called it will update the statistics. 200,500 will trigger the out-of-date scenario. Really?
So effectively the best way to look at this is, for large tables, it needs to grow by 20% before the statistics will update. I’ve also thought this figure to be a bit on the high side. Things have changed now though, updating statistics was a much more arduous task when there were less cores and less memory, it is far less intrusive now. For example, you couldn’t run sp_updatestats on a live production SQL Server 2000 environment but you could run it on some production SQL Server environs. now. I know, I’ve done it without bringing the server down. Also sp_updatestats has changed somewhat.
Anyways, if only there was something we could do about this 20%? There must be a trace flag or something…..