SQL Server Diary

Tuesday, May 17, 2016

"Operating system error 112(There is not enough space on the disk.) encountered." Error When Starting SQL Server Service After Changing The Initial Size or Location of [tempdb] Data File.

You've changed the location or the initial size of the data file of the [tempdb], then you restarted SQL Server services, but that failed with errors in the Event Viewer like the following:

The operating system returned error 38(Reached the end of the file.) to SQL Server during a read at offset 0000000000000000 in file '[some path]\tempDB.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

and

[some path]\tempDB.mdf: Operating system error 112(There is not enough space on the disk.) encountered.

Solution:

Go to SQL Server Configuration Manager and change the SQL Server services account to [NT AUTHORITY\NetworkService].
Make sure you are login into windows using a machine administrator.
Run CMD as administrator.
Execute the following command:
net start mssqlserver /f
this will start SQL Server services with the minimal requirement and configuration, and in single user mode that allows just one administrator to login.
Immediately after that run the following command:
osql -E
you will connect to SQL Server using the current windows user (which should be machine admin), through CMD, and using command line.
Now change the initial size to a size you think it is available on where the tempdb data file is located. For example, the following will change the initial size of the tempdb data file to 1GB:
alter database tempdb modify file ( name=tempdev ,size=1000MB);
go
If the above point finished successfully, go back to SQL Server Configuration Manager and change back the account that runs SQL Server service to the one you want, then try to start the service again.

Saturday, October 24, 2015

SQL Server Storage Monitoring

Use the following query (by Paul Randal) to break the I/O workload by disk:

SELECT

--virtual file latency

ReadLatency =

CASE WHEN num_of_reads = 0

THEN 0 ELSE (io_stall_read_ms / num_of_reads) END,

WriteLatency =

CASE WHEN num_of_writes = 0

THEN 0 ELSE (io_stall_write_ms / num_of_writes) END,

Latency =

CASE WHEN (num_of_reads = 0 AND num_of_writes = 0)

THEN 0 ELSE (io_stall / (num_of_reads + num_of_writes)) END,

--avg bytes per IOP

AvgBPerRead =

CASE WHEN num_of_reads = 0

THEN 0 ELSE (num_of_bytes_read / num_of_reads) END,

AvgBPerWrite =

CASE WHEN io_stall_write_ms = 0

THEN 0 ELSE (num_of_bytes_written / num_of_writes) END,

AvgBPerTransfer =

CASE WHEN (num_of_reads = 0 AND num_of_writes = 0)

THEN 0 ELSE

((num_of_bytes_read + num_of_bytes_written) /

(num_of_reads + num_of_writes)) END,

LEFT (mf.physical_name, 2) AS Drive,

DB_NAME (vfs.database_id) AS DB,

vfs.*,

mf.physical_name

FROM sys.dm_io_virtual_file_stats (NULL,NULL) AS vfs

JOIN sys.master_files AS mf

ON vfs.database_id = mf.database_id

AND vfs.file_id = mf.file_id

--WHERE vfs.file_id = 2 -- log files

-- ORDER BY Latency DESC

-- ORDER BY ReadLatency DESC

ORDER BY WriteLatency DESC

PerfMon counters:

Physical Disk Object: Avg. Disk Queue Length represents the average number of physical read and write requests that were queued on the selected physical disk during the sampling period. If your disk queue length frequently exceeds a value of 2 during peak usage of SQL Server, then you might have an I/O bottleneck.
Avg. Disk Sec/Read is the average time, in seconds, of a read of data from the disk. Any number:

Less than 10 ms = good performance
Between 10 ms and 20 ms = slow performance
Between 20 ms and 50 ms = poor performance
Greater than 50 ms = significant performance problem.

Avg. Disk Sec/Write is the average time, in seconds, of a write of data to the disk. Please refer to the guideline in the previous bullet.
Physical Disk: %Disk Time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. A general guideline is that if this value is greater than 50 percent, it represents an I/O bottleneck.
Avg. Disk Reads/Sec is the rate of read operations on the disk. You need to make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.
Avg. Disk Writes/Sec is the rate of write operations on the disk. Make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.

Monday, August 24, 2015

SQL Server 2005 Waits and Queues SQL Server Best Practices Article --- Summary

A session process rotates between or falls in the following statuses: Running (only one session can be running or executing, per scheduler), Runnable (sessions waiting for CPU), or Suspended. SPIDs with suspended statuses are placed in Waiter List until the requested resources are available (e.g. If a running session needs a data page that is not in cache, or needs a page that is blocked by another user’s lock)
The current runnable queue is found in sys.dm_exec_requests where the status is “runnable”. The total time that is spent waiting in sys.dm_os_waiting_tasks is found in the column wait_time_ms and the time that is spent waiting for CPU in the runnable queue is called signal_wait _time_ms. Resource waits can be computed by subtracting signal_wait_time_ms from wait_time_ms. The difference between resource and signal waits shows the extent of CPU pressure, if any, on overall performance. A low signal (where signal is less than 25% of the total waits) to resource wait ratio indicates there is little CPU pressure.
Common scenarios to avoid in OLTP:

High Frequency queries having a high number of table joins (>4).
Frequently updated tables having # indexes(>3).
Table Scans and Range Scans.
Unused Indexes.
Signal wait >25%
Plan reuse <90%
Parallelism: Cxpacket waits >5%
Page life expectancy < Total RAM /4*300
Memory Grants Pending >1
SQL cache hit ratio <90
Average Disk sec/read >20
Average Disk sec/write >20
If Top 2 values for wait stats are any of the following: ASYNCH_IO_COMPLETION, IO_COMPLETION, LOGMGR,WRITELOG, PAGEIOLATCH_x, there is IO bottleneck.
Block percentage >2%
*Average Row Lock Waits >100ms
*If Top 2 values for wait stats are any of the following: LCK_M_BU, LCK_M_IS, LCK_M_IU, LCK_M_IX, LCK_M_RIn_NL, LCK_M_RIn_S, LCK_M_RIn_U, LCK_M_RIn_X, LCK_M_RS_S, LCK_M_RS_U, LCK_M_RX_S, LCK_M_RX_U, LCK_M_RX_X,LCK_M_S, LCK_M_SCH_M, LCK_M_SCH_S, LCK_M_SIU, LCK_M_SIX, LCK_M_U, LCK_M_UIX, LCK_M_X
High number of deadlocks >5 per hour

Common Scenarios to avoid with DataWarehousing:

Excessive fragmentation: Average fragmentation_in_percent >25%
Table scans and ranges >=1
Signal waits >25%
Plan reuse >25% Data warehouse has fewer transactions than OLTP, each with significantly bigger IO. Therefore, having the correct plan is more important than reusing a plan. Unlike OLTP, data warehousequeries are not identical.
Parallelism: Cxpacket wait < 10%
Memory grants pending > 1
Average Disk sec/read > 20 ms
Average Disk sec/write > 20 ms
If Top 2 values for wait stats are any of the following: ASYNCH_IO_COMPLETION, IO_COMPLETION, LOGMGR, WRITELOG, PAGEIOLATCH_x, then there is IO bottleneck.
Block percentage >2%
Average Row Lock Waits >100ms
If Top 2 values for wait stats are any of the following: LCK_M_BU, LCK_M_IS, LCK_M_IU, LCK_M_IX, LCK_M_RIn_NL, LCK_M_RIn_S, LCK_M_RIn_U, LCK_M_RIn_X, LCK_M_RS_S, LCK_M_RS_U, LCK_M_RX_S, LCK_M_RX_U, LCK_M_RX_X,LCK_M_S, LCK_M_SCH_M, LCK_M_SCH_S, LCK_M_SIU, LCK_M_SIX, LCK_M_U, LCK_M_UIX, LCK_M_X

Latencies caused by disk to memory transfers frequently surface as PageIOLatch waits. Memory pressure or disk IO subsystem issues can also increase PageIOLatch waits. Consequently, high values for both PageIOLatch_ex and PageIOLatch_sh wait types can indicate IO subsystem issues. Pertinent performance counters include Physical disk: disk seconds/read and Physical disk: disk seconds/write and SQL Server Buffer Manager: Page Life Expectancy.
If IO_stalls is inordinately high for one or more files, it is possible that there is either a disk bottleneck or that high reads and writes areoccurring on one drive. This should be corroborated with Performance Monitor countersPhysical Disk: Average Disk Seconds/Read and Average Disk Seconds/Write. The script in page 16 computes the Average Disk Seconds/Read and Average Disk Seconds/Write.
OLTP applications should have high plan reuse, > 90%. To compute that:Initial Compilations = SQL Compilations/sec – SQL Re-Compilations/secPlan reuse = (Batch requests/sec – Initial Compilations/sec) / Batch requests/sec.Memory pressure can cause query plans to be discarded and therefore result in reduced plan reuse.

ASYNC_IO_COMPLETION:

Occurs when a task is waiting for asynchronous I/Os to finish.

Identify disk bottlenecks, by using PerfmonCounters, Profiler, sys.dm_io_virtual_file_stats and SHOWPLAN.

Any of the following reducesthese waits:

1. Adding additional IO bandwidth.

12. Balancing IO across other drives.

13. Reducing IO with appropriateindexing.

14. Check for bad query plans.

15. Check for memory pressure.

See PERFMON Physical Disk performancecounters:

1. Disk sec/read

16. Disk sec/write

17. Disk queues

See PERFMONSQLServer:Buffer Manager performancecounters for memory pressure:

1. Page Life Expectancy

18. Checkpoint pages/sec

19. Lazy writes/sec

See PERFMONSQLServer:Access Methods for correct indexing:

1. Full Scans/sec

20. Index seeks/sec

SQL Profiler can be used to identify which Transact-SQL statements do scans. Select the scans event class and events scan:started and scan:completed. Include the object Id data column. Save the profiler trace to a trace table, and then search for the scans event. The scan:completed event providesassociated IO so that you can also search for high reads, writes, and duration.

Check SHOWPLAN for bad query plans.

Reference: http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/Performance_Tuning_Waits_Queues.doc

Summary for SQL Server on VMware Best Practices Guide

The following is a summary for SQL Server on VMware Best Practices Guide:

Using VMware Capacity Planner™, this service collects all essential performance metrics, including processor, disk, memory, and network statistics, and specific SQL Server metrics on existing database servers in the environment. Capacity Planner analyzes this data to recommend server consolidation opportunities. This exercise helps you to understand what resources your current physical SQL Servers use, and makes it easier for you to create a short list of SQL Server instances to virtualize and to determine the order in which you should virtualize the SQL Server instances.
We strongly recommend using a VMware enterprise-class hypervisor, vSphere, to deploy virtualized SQL Server instances, even for development and test environments.

CPU

When maximizing performance is generally the primary goal for these workloads: Provide CPU resources by maintaining a 1:1 ratio of the physical cores to vCPUs. For example, if the workloads run on a 4 CPU core machine in the physical system, allocate 4 vCPU on the vSphere host for the virtual machine.
For small SQL Server virtual machines, allocate virtual machine CPUs equal to or less than the number of cores in each physical NUMA node. For wide SQL Server virtual machines, size virtual machine CPUs to align with physical NUMA boundaries. Configure vNUMA to enable SQL Server NUMA optimization to take advantage of managing memory locality.
Hyperthreading Sharing on the Properties tab of a virtual machine provides control of whether a virtual machine should be scheduled to share a physical processor if hyperthreading is enabled on the host. Choose one of the following settings:

Any – This is the default setting. The vCPUs of this virtual machine can freely share cores with other virtual CPUs of this or other virtual machines.
None – The vCPUs of this virtual machine have exclusive use of a processor whenever they are scheduled to the core. Selecting None in effect disables hyperthreading for your virtual machine. When maximizing performance is generally the primary goal for these workloads.
Internal – This option is similar to none. Virtual CPUs from this virtual machine cannot share cores with virtual CPUs from other virtual machines. They can share cores with the other virtual CPUs from the same virtual machine.

Memory

To avoid performance latency resulting from remote memory accesses, you should size an SQL Server virtual machine’s memory so it is less than the amount available per NUMA node.
Large pages can significantly improve the performance of this workload, compared to running the workload using small pages; so if the server (the VM) is SQL Server dedicated (or all major servers on that VM) can benefit from large pages then: ❶enable large page on windows and ❷ enable large page in SQL Server. This is possible when the following conditions are met:

You are using SQL Server Enterprise Edition.
The computer has 8GB or more of physical RAM.
The Lock Pages in Memory privilege is set for the service account…. I personally see granting this (Lock Pages in Memory) as dangerous option to play with and has to be set after very careful considerations.

When calculating the amount of memory to provision for the virtual machine, use the following formulas:
VM Memory = SQL Max Server Memory + ThreadStack + OS Mem + VM Overhead
ThreadStack = SQL Max Worker Threads * ThreadStackSize

The value for 'max worker threads' can be found in the max_worker_count column of sys.dm_os_sys_info. ThreadStackSize = 1MB on x86 = 2MB on x64 = 4MB on IA64. OS Mem: 1 GB for each 4 GB of RAM installed from 4–16 GB, and then 1 GB for every 8 GB RAM installed above 16 GB RAM.

Storage

A single Iometer thread running in a virtual machine can saturate the bandwidth of the respective networks for all four storage protocols (NFS, SW iSCSI, HW iSCSI, and Fibre Channel), for both read and write. Fibre Channel throughput performance is higher because of the higher bandwidth of the Fibre Channel link. For the IP-based protocols, there is no significant throughput difference for most block sizes.
It is preferable to deploy virtual machine files on shared storage to take advantage of vSphere vMotion, vSphere HA, and vSphere DRS. This is considered a best practice for mission-critical SQL Server deployments that are often installed on third-party, shared-storage management solutions. VMware recommends that you set up a minimum of four paths from a vSphere host to a storage array. This means that each host requires at least two HBA ports.
Create VMFS partitions from within VMware vCenter. They are aligned by default. Align the data disk for heavy I/O workloads using diskpart or, with Windows Server 2008, the disk is automatically aligned to a 1 MB boundary.
If you are deploying a heavy workload SQL Server, VMware recommends that you consider using eagerzeroedthick disks for SQL Server data, transaction log, and tempdb files. An eagerzeroedthick disk can be configured by selecting the Thick Provision Eager Zeroed option in vSphere 5.x.
Place SQL Server binary, log, and data files into separate VMDKs (Virtual Machine Disk). In additional to the performance advantage, separating SQL Server binary from data and log also provides better flexibility for backup. The OS/SQL Server Binary VMDK can be backed up with snapshot-based backups, such as VMware Data Recovery. The SQL Server data and log files can be backed up through traditional database backup solutions.
Maintain 1:1 mapping between VMDKs and LUNs. When this is not possible, group VMDKs and SQL Server files with similar I/O characteristics on common LUNs.
Use multiple vSCSI adapters. Place SQL Server binary, data, log onto separate vSCSI adapter optimizes I/O by distributing load across multiple target devices.

Networking:

Use two physical NICs per vSwitch, and if possible, uplink the physical NICs to separate physical switches.
Use separate physical NICs for management traffic (vSphere vMotion, FT logging, VMkernel) and virtual machine traffic.
If using iSCSI, the network adapters should be dedicated to either network communication or iSCSI, but not both.

Summary for Performance Tuning Using SQL Server DMVs Book / Chapter 7: OS and Hardware Interaction

Chapter 7: OS and Hardware Interaction

signal_wait_time_ms is a value returned by sys.dm_os_wait_stats DMV and represents the time a thread is waiting in the queue for the CPU. wait_time_ms (total waits) is another value returned by the previous query and represents the total time spent waiting for ALL resources (not just CPU). The key metric, with regard to potential CPU pressure, is the signal wait as a percentage of the total waits. A high percentage signal is a sign of CPU pressure. The literature tends to quote "high" as more than about 25%, but it depends on your system … P263-264.
The values provided by sys.dm_os_wait_stats DMV are running totals, accumulated across all sessions since the server was last restarted or the statistics were manually reset using the DBCC SQLPERF command shown in Listing 7.1.
Listing 7.2: The most common waits.
OLEDB wait type explanation in … P267.
CXPACKET wait type explanation in … P268.
ASYNC_NETWORK_IO wait type explanation in … P269.
SOS_SCHEDULER_YIELD wait type explanation in … P270.
The script in Listing 7.3 will help determine on which resources SQL Server is spending the most time waiting, as a percentage of the total amount of time spent waiting on any wait_type that doesn't appear in the exclusion list.
If you feel that excessive locking may be the root cause of a performance issue, the query shown in Listing 7.4 filters on all locking wait types and order by the number of occurrence.
This query is useful to help confirm CPU pressure. Since signal waits are time waiting for a CPU to service a thread, if you record total signal waits above roughly 10–15%, this is a pretty good indicator of CPU pressure. These wait stats are cumulative since SQL Server was last restarted, so you need to know what your baseline value for signal waits is, and watch the trend over time.
sys.dm_os_performance_counters DMV returns only SQL Server counters are represented in the DMV (not any Windows or other counters).
Listing 7.8: Monitoring changes in the size of the transaction log. The query there returns data related to ‘Log Growths’ and ‘Log Shrinks’. Shrinking the log (i.e. removing any unused space) is, as a general habit, a bad idea as it just means that you are likely to see a log growth event very soon. The best practice is to have log space pre-allocated and not to adjust it; free log space is not harmful to performance, but a full log is.
With the query in listing 7.11 you can returning the current value for the buffer cache hit ratio.
The sys.dm_os_sys_ info DMV returns a single row, summarizing the characteristics of a given machine, in terms of static configuration, such as the number of processors, as well as how long a server has been running. Columns represented in the view are described in … P289-290. Listing 7.14 (information about CPU) and listing 7.15 (info about memory).
In order to identify or rule out CPU pressure as a potential cause of slow execution times, we can use sys.dm_os_schedulers to investigate the kind of load being applied to our CPUs. It provides information on the current activity of the schedulers that are being used to apportion work to CPUs, along with a few cumulative counters that reveal how often certain events, such as context switches, have occurred since the computer was last started. The query in Listing 7.16 calculates the average number of tasks being managed, and tasks waiting in the runnable queue, across all available schedulers. High, sustained values for the current_tasks_count column usually indicate a blocking issue. It can also be a secondary indicator of I/O pressure. High, sustained values for the runnable_tasks_count column are usually a very good indicator of CPU pressure, since this means that many tasks are waiting for CPU time. The longer the queue, and the greater the number of schedulers with requests waiting, the more stressed is the CPU subsystem. For most systems, a sustained runnable task queue length of more than about 10–20 is a cause for concern. For individual schedulers, we can employ the query shown in Listing 7.17, which uses various status columns to investigate potential CPU pressure or, by a simple variation of the WHERE clause, to indicate whether or not I/O pressure is the root cause of the system slowdown.
If the system isn't experiencing CPU pressure, is not I/O bound, and there aren't too many other types of waits on the system, then you may need to investigate the possibility that there are simply too few threads available to the server. The query in Listing 7.18 investigates the possible need to adjust the server threads system configuration, by checking the average value of work_queue_count. In the absence of CPU pressure, values of more than one for the average of the work_ queue_count column can mean that you need to increase the number of threads allowable to the server.
Context switching occurs when a process fails to complete in a reasonable time, and the process has to give up control of the scheduler (again, representing a CPU) to let another process complete. This is not ideal as it increases load on the CPU. Listing 7.19 gets the counts that represents context switching, but those a accumulative and may include some night activities that where such context switching does not affect us; so in order to detect excessive context switching during normal OLTP operations caused, we need to take a baseline measurement and then track and compare the results over time. As noted during the earlier discussion on common wait types, hyper-threading (parallel execution) in OTLP systems can lead to context switching, and to CXPACKET and SOS_SCHEDULER_YIELD waits. if hyper-threading is enabled and you are seeing more than 5000 * (Number of Processors) context switches per second, it suggests that the scheduler is context switching between the same processor, yet thinking it is swapping to a different one, and you should consider turning hyper-threading off and retesting performance. A large number of idle switches, indicating periods when the processor was idle with no requests to process, can indicate a poor distribution of workload across available processors. A low value on one scheduler, coupled with very high utilization on another one, might indicate that large operations that really could benefit from using parallel operations are not doing so. For example, if you have MAXDOP set to 1 on a server with 8 CPUs, then your CPU-intensive nightly reports will run on only one of the CPUs, while all the others sit idle.
Listing 7.22 uses sys.dm_os_sys_ memory to get an overview of how much RAM is available to the operating system in total, and how much of it is currently not in use.
sys.dm_os_sys_memory DMV: gives you an overview of how the server is managing memory
sys.dm_os_process_memory DMV: gives you an overview of how the SQL Server process is doing for memory.
sys.dm_os_memory_cache_counters – stats on how the cache memory is being utilized, including how much memory is actively being used
sys.dm_os_latch_stats – stats on the low-level resource locks that SQL Server uses to lock physical pages in memory, and more.

Reference: http://www.amazon.com/Performance-Tuning-Server-Dynamic-Management/dp/1906434476

Summary for Performance Tuning Using SQL Server DMVs Book / Chapter 6: Physical Disk Statistics and Utilization

Chapter 6: Physical Disk Statistics and Utilization

It is vital that, before even installing SLQ Server, you gain a good understanding of the I/O capacity of your disk subsystem, using tools such as SQLIO.
sys.dm_db_partition_stats helps us find the largest tables and indexes, especially those that are subject to heavy updates, monitor the effectiveness of their partitioning scheme, or investigate the need to upgrade/configure the disk I/O subsystem appropriately. It returns one row per table partition; if the table is not partitioned it will return one row for the table. Note that when querying this view, we filter out system objects which reach to around 100 for each database.
The query in Listing 6.1 will provide the total number of rows in all clustered indexes and heaps on a given SQL Server instance. System objects such as sys.dm_db_partition_stats are updated asynchronously, for performance reasons, so the counts may not be completely up to date.
Rebuilding indexes can bloat transaction logs and, when using differential backups, it means that all the pages in the index/tables will have changed since the last full backup … P230.
https://social.msdn.microsoft.com/Forums/en-US/5146912a-395c-4129-8aeb-4f82b949bdfc/does-reorganizing-indexes-bloat-transaction-logs-and-differential-backups-as-rebuilding-them-the?forum=sqlkjmanageability

The occurrence of page splits (which happens due to inserting new records in full data pages, which in turn causes fragmentation) can be minimized to some degree, though not avoided altogether, by setting the appropriate fill factor for the clustered table (e.g. 90%) thus allowing space for new data on each page. In fact, a common cause of fragmentation is rebuilding clustered tables and indexes and forgetting to set the fill factor appropriately. By default, the fill factor will be 0 (meaning zero spare space). This can end up causing a lot more subsequent fragmentation than was resolved by rebuilding! … P233.
Many designers persist in the habit of using GUIDs for surrogate keys, and clustering on them. GUIDs are random in nature, and tend not to be created sequentially and, as a result, insertions of data into the middle of the table are common which causes pages splitting and fragmentation in P233-235 there are examples to illustrate this.
With heaps, the storage engine inserts all rows at the end of the table, in order of arrival. This makes inserting into a heap super-fast. As such, many people use heaps as a place to drop rows (for instance, when logging operations, and even when loading data using bulk copy), while avoiding the performance impact of index maintenance. On the contrary reading from a fragmented heap, is a performance nightmare.
Diagnosing I/O Bottlenecks: start with PerfMon counters such as PhysicalDisk Object: Avg. Disk Queue Length and Avg. Disk Reads/Sec, which can help you work out the number of I/Os per disk, per second, and how many physical I/O requests are being queued, on a given disk. And also use sys.dm_os_wait_stats DMV and look if you can find PAGEIOLATCH_EX or PAGEIOLATCH_SH among the top waits, this indicates that many sessions are experiencing delays in obtaining a latch for a buffer, since the buffer is involved in physical I/O requests … P239
sys.dm_io_virtual_file_stats DMF gives cumulative physical I/O statistics, indicating how frequently the file has been used by the database for reads and writes since the server was last rebooted. It also provides a very useful metric in the form of the "I/O stall" time, which indicates the total amount of time that user processes have waited for I/O to be completed on the file in question. Ultimately, high stall rates could simply indicate that the disk I/O subsystem is inadequate to handle the required I/O throughput Note that this DMF measures physical I/O only. Logical I/O operations that read from cached data will not show up here. Explanation of the columns it returns are in P240-241.
Investigating physical I/O and I/O stalls: the writers store in a temporary table the result of the query in listing 6.13, which captures the baseline disk I/O statistics from sys.dm_io_virtual_file_stats, then wait for 10 seconds (or depending on the workload on your server), then they run the query in listing 6.15 which compares the current statistics with the baseline. And they are suggesting that, in any event, it is certainly worrying to see that the stall times on a drive are substantially greater than the elapsed time on that same drive … P243-246.
sys.dm_io_pending_io_requests DMV returns a row for each currently pending I/O request at the file level. More than two or three pending process could indicate an issue. If you regularly observe a high number of pending I/O requests on a single drive, you should consider moving some of the files onto a separate drive, on a different access channel. Using the query in Listing 6.16, we can view the file name, the status, and how long the operation has been waiting.
In an optimized system the read:write ratio should ideally be close to 50:50. In reality there are almost always more reads than writes. A higher ratio than around 80:20, start to suspect non-optimal queries, or insufficient cache memory to avoid physical disk access. If you find that the read:write ratio is 50:50 based on counts (NUMBER of read operations versus write operations), and 99:1 based on data (SIZE or amount of data read versus written), this indicates that you are reading a lot of data to write a little data, which could be caused by inefficient database code, perhaps allowing users to search in a very inefficient manner, resulting in table scans. Bear in mind that this includes only actual writes to the files, or reads from the file, and will not reflect data read from the cache, or written to the cache and not yet flushed to the disk. Listing 6.17 calculates this based on the AMOUNT (size) of data. Listing 6.18 does the same, but also slice and group on the drive letter. If those queries results something to worry about, the next step is to obtain the read:write ratios for this database in terms of the number of read and write operations. If this method reveals a ratio much closer to 50:50 then we know that reads are reading a disproportionately high amount of data. You can find this using query in listing 6.19. Listing 6.21 gets number of reads and writes at the table level, but note that the returned counts include logical operations also.
Listing 6.22 demonstrates how to get an overview of tempdb utilization.

Reference: http://www.amazon.com/Performance-Tuning-Server-Dynamic-Management/dp/1906434476

Summary for Performance Tuning Using SQL Server DMVs Book / Chapter 5: Indexing Strategy and Maintenance

Chapter 5: Indexing Strategy and Maintenance

sys.indexes indexing catalog view provides metadata at the index level, such as index identifiers (e.g. index name) and configuration settings (e.g. fill factor).
You can either create clustered index on a key that naturally reflects the way the data will be queried, or you can cluster on a narrow, ever-increasing integer key (to minimize subsequent fragmentation) and use non-clustered indexes for query efficiency. Most of the advice out there points to the latter approach … P180.
The ratio of unique values within a key column is referred to as index selectivity. The more unique the values, the higher the selectivity, which means that a unique index has the highest possible selectivity. The query engine loves highly selective key columns, especially if those columns are referenced in the WHERE clause of your frequently run queries. The higher the selectivity, the faster the query engine can reduce the size of the result set. If you've chosen a low selectivity column for the index key (i.e. where each key value matches many rows), then the optimizer may decide to simply perform a table scan to return a piece of data. Table scans have a bad reputation, but this is because they often mean reading a huge number of rows; in small tables, scanning all the rows is sometimes quicker than reading the data from the leaf levels of an index. In creating an index, the leading (first) column should be selective. However, this does not mean each index should start with the PK column; it must be a column that is likely to get searched on… P181
It is a balancing act; having a huge number of single column indexes (narrow indexes) is a bad idea, as there will be too many indexes to maintain. Also wide indexes is bad, as you will fit few on a data page, index will take up a lot of space, and scanning it will be inefficient… P182.
sys.dm_db_index_usage_stats DMV is used to obtain statistics on how indexes have been used to resolve queries. Data in this DMV is cumulative, and is refreshed when the server is restarted, when the index is dropped and recreated, or rebuilt or reorganized. Listing 5.3 is an example of how to use it, and the following are some important columns there:
- user_seeks – the number of times the index has been used in a seek operation (to find a specific row)
- user_scans – number of times the index has been used by scanning the leaf pages of the index for data.
- user_lookups – for clustered indexes only, this is the number of times the index has been used in a "bookmark lookup" to fetch the full row; this is because non-clustered indexes use the clustered indexes key as the pointer to the base row
- user_updates – number of times the index has been modified due to a change in the table's data.
- User reads – total number of times that the index is used by the optimizer to resolve a query … P182-184
Identify indexes that have never been accessed: Listing 5.4 finds those indexes that have had no reads or writes, in other words those indexes that do not show up in sys.dm_db_index_usage_stats. Those indexes can potentially be dropped, after some further investigation ... P185
Listing 5.5 identifies indexes that are being maintained but not used. You can add the following to the select statement to generate the drop index command that drops the index:
'DROP INDEX [' + i.[name] + '] ON [' + su.[name] + '].[' + o.[name]+ '] WITH ( ONLINE = OFF )' AS [drop_command]
it is always advisable to first check how recently the usage stats were cleared, by running the query in listing 5.6, and also after adequate testing in a non-production environment… P188.
Listing 5.7: Finding rarely-used inefficient indexes.
The sys.dm_db_index_operational_stats is a DMF provides index usage statistics at a more detailed level than those provided by the sys.dm_db_index_usage_stats DMV, it offers more detailed information about how the index is used at a physical level, via columns such as leaf_insert_count, leaf_update_count and leaf_delete_count, as well as the nonleaf_* equivalents, for modifications above the leaf level. Also it can provide evidence of potential lock or latch contention on the objects, or of excessive I/O being issued by the object. The data returned by this DMF exists only as long as the metadata cache object that represents the heap or index is available. Application for this is listing 5.8 … P190-193.
Listing 5.9 uses sys.dm_db_index_operational_stats to return records that relate to locking and blocking at the row level for the indexes of the active database.
Listing 5.10 highlights which of our indexes are encountering latch contention using the page_io_latch_wait_count and page_io_wait_in_ms columns. I/O latching occurs on disk-to-memory transfers, and high I/O latch counts could be a reflection of a disk subsystem issue, particularly when you see average latch wait times of over 15 milliseconds.
The query in Listing 5.11 uses sys.dm_db_index_operational_stats DMV to provide information on the count of attempts made by SQL Server to escalate row and page locks to table locks for a specific object, and the percentage success in performing the escalation.
Listing 5.12 identifies indexes associated with lock contention, and it is a good next step after querying sys.dm_os_wait_stats DMV (discussed in Chapter 7), and the outcome points to locking problems.
Finding Missing Indexes: information about missing indexes is exposed via the following four MDOs: sys.dm_db_missing_index_details, sys.dm_db_missing_index_columns, sys.dm_db_missing_index_group_stats, and sys.dm_db_missing_index_groups. The data stored by each of these DMOs is reset on a server restart. You can find a brief explanation of those DMOs and a review for the most significant columns in P199-204. You should never just blindly add every index that these DMOs suggest, instead, you need to examine the results of the query carefully and manually filter out results that are not part of your regular workload. Listing 5.13 puts those DMOs to good use and find the most beneficial missing indexes. The following is brief explanation of the returned columns:

Index_advantage: overall benefit of a suggested index, or the importance of creating the suggested index. The higher this value is, the more important the index is.
Statement: the table or view name.
equality_columns: the columns that would have been useful, based on an equality predicate.
inequality_columns: the columns that would have been useful, based on an inequality predicate (i.e. any comparison other than "column = value")
included_columns: columns that, if included, would have been useful to cover the query.
unique_compiles: the number of plans that have been compiled that might have used the index.
user_seeks: the number of seek operations in user queries that might have used the index.
avg_total_user_cost: average cost saving for the queries that could have been helped by the index in the group

avg_user_impact: the estimated percentage by which the average query cost would drop, for the queries that could use this index.
When creating the suggested indexes (after careful consideration), list the equality columns first (leftmost in the column list), then list the inequality columns after the equality columns (to the right of equality columns listed). Be warned that the order in which these DMOs list missing columns does not accurately suggest the correct column order for an index key. To determine an effective order for the equality columns, order them based on their selectivity; that is, list the most selective columns first… P205-207.
Finding Indexes Fragmentation: The sys.dm_db_index_physical_stats DMF provides invaluable information regarding the state of your index partitions, including type of index, index depth, and degree of fragmentation. This DMF accepts a number of parameters, one of them “mode”, which determines the thoroughness with which the statistics are collected. Possible values are:

LIMITED: is the default and the least costly, will not scan the leaf level of the indexes and the data pages of heaps are not scanned
SAMPLED: mode returns statistics based only upon a 1% sample of all pages of the indexes and heaps in the scope of the function call; if any page or heap within the scope of the function call has less than 10,000 pages, then DETAILED is automatically used.
DETAILED provides the most complete result set from this function of the three, but can require ample resources; it scans all pages; it returns all statistics.

This DMF returns a large number of columns and we'll only review a subset of them in P211. The query in Listing 5.14
will return fragmentation information for each index in the current database, where the average_fragmentation_in_percent column is greater than 15% and where the page count is greater than 500. On a busy system, this can be a resource-intensive query. The rule of thumb for index reorganization is 15–30% fragmentation. The often recommended process for indexes with fragmentation greater than 30% is a rebuild of the index.

Reference: http://www.amazon.com/Performance-Tuning-Server-Dynamic-Management/dp/1906434476