Monday, August 24, 2015

Summary for Performance Tuning Using SQL Server DMVs Book / Chapter 7: OS and Hardware Interaction

Chapter 7: OS and Hardware Interaction


  • signal_wait_time_ms is a value returned by sys.dm_os_wait_stats DMV and represents the time a thread is waiting in the queue for the CPU. wait_time_ms (total waits) is another value returned by the previous query and represents the total time spent waiting for ALL resources (not just CPU). The key metric, with regard to potential CPU pressure, is the signal wait as a percentage of the total waits. A high percentage signal is a sign of CPU pressure. The literature tends to quote "high" as more than about 25%, but it depends on your system … P263-264.
  • The values provided by sys.dm_os_wait_stats DMV are running totals, accumulated across all sessions since the server was last restarted or the statistics were manually reset using the DBCC SQLPERF command shown in Listing 7.1.
  • Listing 7.2: The most common waits.
  • OLEDB wait type explanation in … P267.
  • CXPACKET wait type explanation in … P268.
  • ASYNC_NETWORK_IO wait type explanation in … P269.
  • SOS_SCHEDULER_YIELD wait type explanation in … P270.
  • The script in Listing 7.3 will help determine on which resources SQL Server is spending the most time waiting, as a percentage of the total amount of time spent waiting on any wait_type that doesn't appear in the exclusion list.
  • If you feel that excessive locking may be the root cause of a performance issue, the query shown in Listing 7.4 filters on all locking wait types and order by the number of occurrence. 
  • This query is useful to help confirm CPU pressure. Since signal waits are time waiting for a CPU to service a thread, if you record total signal waits above roughly 10–15%, this is a pretty good indicator of CPU pressure. These wait stats are cumulative since SQL Server was last restarted, so you need to know what your baseline value for signal waits is, and watch the trend over time.
  • sys.dm_os_performance_counters DMV returns only SQL Server counters are represented in the DMV (not any Windows or other counters).
  • Listing 7.8: Monitoring changes in the size of the transaction log. The query there returns data related to ‘Log Growths’ and ‘Log Shrinks’. Shrinking the log (i.e. removing any unused space) is, as a general habit, a bad idea as it just means that you are likely to see a log growth event very soon. The best practice is to have log space pre-allocated and not to adjust it; free log space is not harmful to performance, but a full log is.
  • With the query in listing 7.11 you can returning the current value for the buffer cache hit ratio.
  • The sys.dm_os_sys_ info DMV returns a single row, summarizing the characteristics of a given machine, in terms of static configuration, such as the number of processors, as well as how long a server has been running. Columns represented in the view are described in … P289-290. Listing 7.14 (information about CPU) and listing 7.15 (info about memory).
  • In order to identify or rule out CPU pressure as a potential cause of slow execution times, we can use sys.dm_os_schedulers to investigate the kind of load being applied to our CPUs. It provides information on the current activity of the schedulers that are being used to apportion work to CPUs, along with a few cumulative counters that reveal how often certain events, such as context switches, have occurred since the computer was last started. The query in Listing 7.16 calculates the average number of tasks being managed, and tasks waiting in the runnable queue, across all available schedulers. High, sustained values for the current_tasks_count column usually indicate a blocking issue. It can also be a secondary indicator of I/O pressure. High, sustained values for the runnable_tasks_count column are usually a very good indicator of CPU pressure, since this means that many tasks are waiting for CPU time. The longer the queue, and the greater the number of schedulers with requests waiting, the more stressed is the CPU subsystem. For most systems, a sustained runnable task queue length of more than about 10–20 is a cause for concern. For individual schedulers, we can employ the query shown in Listing 7.17, which uses various status columns to investigate potential CPU pressure or, by a simple variation of the WHERE clause, to indicate whether or not I/O pressure is the root cause of the system slowdown.
  • If the system isn't experiencing CPU pressure, is not I/O bound, and there aren't too many other types of waits on the system, then you may need to investigate the possibility that there are simply too few threads available to the server. The query in Listing 7.18 investigates the possible need to adjust the server threads system configuration, by checking the average value of work_queue_count. In the absence of CPU pressure, values of more than one for the average of the work_ queue_count column can mean that you need to increase the number of threads allowable to the server.
  • Context switching occurs when a process fails to complete in a reasonable time, and the process has to give up control of the scheduler (again, representing a CPU) to let another process complete. This is not ideal as it increases load on the CPU. Listing 7.19 gets the counts that represents context switching, but those a accumulative and may include some night activities that where such context switching does not affect us; so in order to detect excessive context switching during normal OLTP operations caused, we need to take a baseline measurement and then track and compare the results over time. As noted during the earlier discussion on common wait types, hyper-threading (parallel execution) in OTLP systems can lead to context switching, and to CXPACKET and SOS_SCHEDULER_YIELD waits. if hyper-threading is enabled and you are seeing more than 5000 * (Number of Processors) context switches per second, it suggests that the scheduler is context switching between the same processor, yet thinking it is swapping to a different one, and you should consider turning hyper-threading off and retesting performance. A large number of idle switches, indicating periods when the processor was idle with no requests to process, can indicate a poor distribution of workload across available processors. A low value on one scheduler, coupled with very high utilization on another one, might indicate that large operations that really could benefit from using parallel operations are not doing so. For example, if you have MAXDOP set to 1 on a server with 8 CPUs, then your CPU-intensive nightly reports will run on only one of the CPUs, while all the others sit idle.
  • Listing 7.22 uses sys.dm_os_sys_ memory to get an overview of how much RAM is available to the operating system in total, and how much of it is currently not in use.
  • sys.dm_os_sys_memory DMV: gives you an overview of how the server is managing memory
  • sys.dm_os_process_memory DMV: gives you an overview of how the SQL Server process is doing for memory.
  • sys.dm_os_memory_cache_counters – stats on how the cache memory is being utilized, including how much memory is actively being used 
  • sys.dm_os_latch_stats – stats on the low-level resource locks that SQL Server uses to lock physical pages in memory, and more.

No comments:

Post a Comment