Monday, August 24, 2015

Summary for Performance Tuning Using SQL Server DMVs Book / Chapter 5: Indexing Strategy and Maintenance

Chapter 5: Indexing Strategy and Maintenance


    • sys.indexes indexing catalog view provides metadata at the index level, such as index identifiers (e.g. index name) and configuration settings (e.g. fill factor).
    • You can either create clustered index on a key that naturally reflects the way the data will be queried, or you can cluster on a narrow, ever-increasing integer key (to minimize subsequent fragmentation) and use non-clustered indexes for query efficiency. Most of the advice out there points to the latter approach … P180.
    • The ratio of unique values within a key column is referred to as index selectivity. The more unique the values, the higher the selectivity, which means that a unique index has the highest possible selectivity. The query engine loves highly selective key columns, especially if those columns are referenced in the WHERE clause of your frequently run queries. The higher the selectivity, the faster the query engine can reduce the size of the result set. If you've chosen a low selectivity column for the index key (i.e. where each key value matches many rows), then the optimizer may decide to simply perform a table scan to return a piece of data. Table scans have a bad reputation, but this is because they often mean reading a huge number of rows; in small tables, scanning all the rows is sometimes quicker than reading the data from the leaf levels of an index. In creating an index, the leading (first) column should be selective. However, this does not mean each index should start with the PK column; it must be a column that is likely to get searched on… P181
    • It is a balancing act; having a huge number of single column indexes (narrow indexes) is a bad idea, as there will be too many indexes to maintain. Also wide indexes is bad, as you will fit few on a data page, index will take up a lot of space, and scanning it will be inefficient… P182.
    • sys.dm_db_index_usage_stats DMV is used to obtain statistics on how indexes have been used to resolve queries. Data in this DMV is cumulative, and is refreshed when the server is restarted, when the index is dropped and recreated, or rebuilt or reorganized. Listing 5.3 is an example of how to use it, and the following are some important columns there:
      • user_seeks – the number of times the index has been used in a seek operation (to find a specific row)
      • user_scans – number of times the index has been used by scanning the leaf pages of the index for data.
      • user_lookups – for clustered indexes only, this is the number of times the index has been used in a "bookmark lookup" to fetch the full row; this is because non-clustered indexes use the clustered indexes key as the pointer to the base row
      • user_updates – number of times the index has been modified due to a change in the table's data.
      • User reads – total number of times that the index is used by the optimizer to resolve a query … P182-184
    • Identify indexes that have never been accessed: Listing 5.4 finds those indexes that have had no reads or writes, in other words those indexes that do not show up in sys.dm_db_index_usage_stats. Those indexes can potentially be dropped, after some further investigation ... P185
    • Listing 5.5 identifies indexes that are being maintained but not used. You can add the following to the select statement to generate the drop index command that drops the index:
    • 'DROP INDEX [' + i.[name] + '] ON [' + su.[name] + '].[' + o.[name]+ '] WITH ( ONLINE = OFF )' AS [drop_command]
    • it is always advisable to first check how recently the usage stats were cleared, by running the query in listing 5.6, and also after adequate testing in a non-production environment… P188.
    • Listing 5.7: Finding rarely-used inefficient indexes.
    • The sys.dm_db_index_operational_stats is a DMF provides index usage statistics at a more detailed level than those provided by the sys.dm_db_index_usage_stats DMV, it offers more detailed information about how the index is used at a physical level, via columns such as leaf_insert_count, leaf_update_count and leaf_delete_count, as well as the nonleaf_* equivalents, for modifications above the leaf level. Also it can provide evidence of potential lock or latch contention on the objects, or of excessive I/O being issued by the object. The data returned by this DMF exists only as long as the metadata cache object that represents the heap or index is available. Application for this is listing 5.8 … P190-193.
    • Listing 5.9 uses sys.dm_db_index_operational_stats to return records that relate to locking and blocking at the row level for the indexes of the active database.
    • Listing 5.10 highlights which of our indexes are encountering latch contention using the page_io_latch_wait_count and page_io_wait_in_ms columns. I/O latching occurs on disk-to-memory transfers, and high I/O latch counts could be a reflection of a disk subsystem issue, particularly when you see average latch wait times of over 15 milliseconds.
    • The query in Listing 5.11 uses sys.dm_db_index_operational_stats DMV to provide information on the count of attempts made by SQL Server to escalate row and page locks to table locks for a specific object, and the percentage success in performing the escalation.
    • Listing 5.12 identifies indexes associated with lock contention, and it is a good next step after querying sys.dm_os_wait_stats DMV (discussed in Chapter 7), and the outcome points to locking problems.
    • Finding Missing Indexes: information about missing indexes is exposed via the following four MDOs: sys.dm_db_missing_index_details, sys.dm_db_missing_index_columns, sys.dm_db_missing_index_group_stats, and sys.dm_db_missing_index_groups. The data stored by each of these DMOs is reset on a server restart. You can find a brief explanation of those DMOs and a review for the most significant columns in P199-204. You should never just blindly add every index that these DMOs suggest, instead, you need to examine the results of the query carefully and manually filter out results that are not part of your regular workload. Listing 5.13 puts those DMOs to good use and find the most beneficial missing indexes. The following is brief explanation of the returned columns:
      • Index_advantage: overall benefit of a suggested index, or the importance of creating the suggested index. The higher this value is, the more important the index is.
      • Statement: the table or view name.
      • equality_columns: the columns that would have been useful, based on an equality predicate.
      • inequality_columns: the columns that would have been useful, based on an inequality predicate (i.e. any comparison other than "column = value")
      • included_columns: columns that, if included, would have been useful to cover the query.
      • unique_compiles: the number of plans that have been compiled that might have used the index.
      • user_seeks: the number of seek operations in user queries that might have used the index.
      • avg_total_user_cost: average cost saving for the queries that could have been helped by the index in the group
    • avg_user_impact: the estimated percentage by which the average query cost would drop, for the queries that could use this index.
    • When creating the suggested indexes (after careful consideration), list the equality columns first (leftmost in the column list), then list the inequality columns after the equality columns (to the right of equality columns listed). Be warned that the order in which these DMOs list missing columns does not accurately suggest the correct column order for an index key. To determine an effective order for the equality columns, order them based on their selectivity; that is, list the most selective columns first… P205-207.
    • Finding Indexes Fragmentation: The sys.dm_db_index_physical_stats DMF provides invaluable information regarding the state of your index partitions, including type of index, index depth, and degree of fragmentation. This DMF accepts a number of parameters, one of them “mode”, which determines the thoroughness with which the statistics are collected. Possible values are: 
      • LIMITED: is the default and the least costly, will not scan the leaf level of the indexes and the data pages of heaps are not scanned
      • SAMPLED: mode returns statistics based only upon a 1% sample of all pages of the indexes and heaps in the scope of the function call; if any page or heap within the scope of the function call has less than 10,000 pages, then DETAILED is automatically used.
      • DETAILED provides the most complete result set from this function of the three, but can require ample resources; it scans all pages; it returns all statistics.
    • This DMF returns a large number of columns and we'll only review a subset of them in P211. The query in Listing 5.14
    • will return fragmentation information for each index in the current database, where the average_fragmentation_in_percent column is greater than 15% and where the page count is greater than 500. On a busy system, this can be a resource-intensive query. The rule of thumb for index reorganization is 15–30% fragmentation. The often recommended process for indexes with fragmentation greater than 30% is a rebuild of the index.
    Referencehttp://www.amazon.com/Performance-Tuning-Server-Dynamic-Management/dp/1906434476

    No comments:

    Post a Comment