Several months ago a poster at SqlServerCentral.com posed a problem that is all too common in SQL Server SAN environments:
Hi,
I have an application in use at a customer site that uses SQL Server 2000 and the customer is complaining about slow performance. The normal issues I receive is that a particular part of the application is running slow. So I request a profiler trace and identify the slow performing query and resolve the problem. This customer says its all slow. I have requested Profiler Traces and examined them and at times I can see queries that normally run in less than 1 second taking 20-30 seconds to complete.
I have also asked the customer for perfmon logs and examined these. The logs show that the database disk (not the log disk) has sometimes queues of 100's of I/O's. The Avg. Disk Sec/Reads show response times for the disk varying from less than 20ms upto 2 seconds.
I believe the issue is with the poor I/O performance, but the customer says its the application.
can anyone advise on how to proceed with an issue like this.
Ronnie
I gave the following response:
Although there are a huge number of possible reasons for this behavior, in my experience it is almost always due to one of two probable influences:
1) Irregularities in the optimizer/statistics facilities. SQL Server 2000 had a number of issues in this area which you can research at Microsoft, however, it is usually easy enough to tell if this may be the cause. Compare the query plan of 20 sec responses to the 2 sec instances. If they are significantly different then that is your likely cause. The easiest Quick & Dirty short-term fix is to add query hints to force them to choose the plan you know they should use.
However, the fact that CPU & Read IO's are the same is an indication that this is probably not what is happening. Which leaves you with...
2) Contention. Contention is a broad category that can take many forms, both obvious and subtle. The high queue size on your data disk makes it likely that you are experiencing physical contention for that disk (and not contention for other resources, such as CPU and not logical application contention such as Locking).
At this point, the possible sources of contention break down as follows:
A) Application Internal: In other words, you have too many users trying to do the same thing at the same time. That disk has become the bottleneck that everyone is backing up behind. You need to either put less demand on that disk or give that disk more IO capacity (ie., optimize the App, or the queries/sProcs, tune the tables/indexes, move the tables to less used disks, OR reconfigure the disk (see below)).
B) SQL Server internal: other usages of the SQL server are putting load on that disk that are contending with the app's usage of it. You need to eliminate, reduce or redirect that other SQL usage.
C) Server Internal: non-SQL usages of that disk, but from the same server, are competing for that disk. You need to eliminate, reduce or redirect that other usage.
D) SAN-concealed: SAN's are commonly carved up into separate Logical volumes that are nonetheless competing for the same physical resources. So for instance, your Log disk and your Data disk may actually be sharing the same physical devices (this is a BAD thing, but much more common than you might think). More likely, some completely unrelated server(s) have heavily-used logical volumes that are using the same physical drives as your data disk. Solution: reconfigure your data disk on the SAN.
More at www.sqlservercentral.com/Forums/Topic466275-65-1.aspx…