Monday, August 24, 2015

SQL Server 2005 Waits and Queues SQL Server Best Practices Article --- Summary

  • A session process rotates between or falls in the following statuses: Running (only one session can be running or executing, per scheduler), Runnable (sessions waiting for CPU), or Suspended.  SPIDs with suspended statuses are placed in Waiter List until the requested resources are available (e.g. If a running session needs a data page that is not in cache, or needs a page that is blocked by another user’s lock)
  • The current runnable queue is found in sys.dm_exec_requests where the status is “runnable”. The total time that is spent waiting in sys.dm_os_waiting_tasks is found in the column wait_time_ms and the time that is spent waiting for CPU in the runnable queue is called signal_wait _time_ms. Resource waits can be computed by subtracting signal_wait_time_ms from wait_time_ms. The difference between resource and signal waits shows the extent of CPU pressure, if any, on overall performance. A low signal (where signal is less than 25% of the total waits) to resource wait ratio indicates there is little CPU pressure.
  • Common scenarios to avoid in OLTP:
    • High Frequency queries having a high number of table joins (>4).
    • Frequently updated tables having # indexes(>3).
    • Table Scans and Range Scans.
    • Unused Indexes.
    • Signal wait >25%
    • Plan reuse <90%
    • Parallelism: Cxpacket waits >5%
    • Page life expectancy < Total RAM /4*300
    • Memory Grants Pending >1
    • SQL cache hit ratio <90
    • Average Disk sec/read >20
    • Average Disk sec/write >20
    • If Top 2 values for wait stats are any of the following: ASYNCH_IO_COMPLETION, IO_COMPLETION, LOGMGR,WRITELOG, PAGEIOLATCH_x, there is IO bottleneck.
    • Block percentage >2%
    • *Average Row Lock Waits >100ms
    • *If Top 2 values for wait stats are any of the following: LCK_M_BU, LCK_M_IS, LCK_M_IU, LCK_M_IX, LCK_M_RIn_NL, LCK_M_RIn_S, LCK_M_RIn_U, LCK_M_RIn_X, LCK_M_RS_S, LCK_M_RS_U, LCK_M_RX_S, LCK_M_RX_U, LCK_M_RX_X,LCK_M_S, LCK_M_SCH_M, LCK_M_SCH_S, LCK_M_SIU, LCK_M_SIX, LCK_M_U, LCK_M_UIX, LCK_M_X
    • High number of deadlocks >5 per hour 
  • Common Scenarios to avoid with DataWarehousing:
    • Excessive fragmentation: Average fragmentation_in_percent >25%
    • Table scans and ranges >=1
    • Signal waits >25% 
    • Plan reuse >25%  Data warehouse has fewer transactions than OLTP, each with significantly bigger IO. Therefore, having the correct plan is more important than reusing a plan. Unlike OLTP, data warehousequeries are not identical.
    • Parallelism: Cxpacket wait < 10%
    • Memory grants pending > 1
    • Average Disk sec/read > 20 ms
    • Average Disk sec/write > 20 ms
    • If Top 2 values for wait stats are any of the following: ASYNCH_IO_COMPLETION, IO_COMPLETION, LOGMGR, WRITELOG, PAGEIOLATCH_x, then there is IO bottleneck.
    • Block percentage >2% 
    • Average Row Lock Waits >100ms 
    • If Top 2 values for wait stats are any of the following: LCK_M_BU, LCK_M_IS, LCK_M_IU, LCK_M_IX, LCK_M_RIn_NL, LCK_M_RIn_S, LCK_M_RIn_U, LCK_M_RIn_X, LCK_M_RS_S, LCK_M_RS_U, LCK_M_RX_S, LCK_M_RX_U, LCK_M_RX_X,LCK_M_S, LCK_M_SCH_M, LCK_M_SCH_S, LCK_M_SIU, LCK_M_SIX, LCK_M_U, LCK_M_UIX, LCK_M_X 
  • Latencies caused by disk to memory transfers frequently surface as PageIOLatch waits.  Memory pressure or disk IO subsystem issues can also increase PageIOLatch waits. Consequently, high values for both PageIOLatch_ex and PageIOLatch_sh wait types can indicate IO subsystem issues. Pertinent performance counters include Physical disk: disk seconds/read and Physical disk: disk seconds/write and SQL Server Buffer Manager: Page Life Expectancy.
  • If IO_stalls is inordinately high for one or more files, it is possible that there is either a disk bottleneck or that high reads and writes areoccurring on one drive. This should be corroborated with Performance Monitor countersPhysical Disk: Average Disk Seconds/Read and Average Disk Seconds/Write. The script in page 16 computes the Average Disk Seconds/Read and Average Disk Seconds/Write.
  • OLTP applications should have high plan reuse, > 90%. To compute that:Initial Compilations = SQL Compilations/sec – SQL Re-Compilations/secPlan reuse = (Batch requests/sec – Initial Compilations/sec) / Batch requests/sec.Memory pressure can cause query plans to be discarded and therefore result in reduced plan reuse.

 ASYNC_IO_COMPLETION: 

Occurs when a task is waiting for asynchronous I/Os to finish.
Identify disk bottlenecks, by using PerfmonCounters, Profiler, sys.dm_io_virtual_file_stats and SHOWPLAN.
Any of the following reducesthese waits:
1. Adding additional IO bandwidth.
12. Balancing IO across other drives.
13. Reducing IO with appropriateindexing. 
14. Check for bad query plans.
15. Check for memory pressure.  
See PERFMON Physical Disk performancecounters: 
1. Disk sec/read
16. Disk sec/write
17. Disk queues
See PERFMONSQLServer:Buffer Manager performancecounters for memory pressure:
1. Page Life Expectancy
18. Checkpoint pages/sec
19. Lazy writes/sec
See PERFMONSQLServer:Access Methods for correct indexing:
1. Full Scans/sec
20. Index seeks/sec
SQL Profiler can be used to identify which Transact-SQL statements do scans. Select the scans event class and events scan:started and scan:completed. Include the object Id data column. Save the profiler trace to a trace table, and then search for the scans event. The scan:completed event providesassociated IO so that you can also search for high reads, writes, and duration.
Check SHOWPLAN for bad query plans.

No comments:

Post a Comment