September 03, 2010

"For successful technology, reality must take precedence over public relations, for nature cannot be fooled."    --  Richard Feynman
Moving Sql Server Code
Author:RBarryYoungCreated:8/31/2008 6:34 PM
Comprehensive Management of Performance problems, including Performance Planning

I saw this over a month ago and I can't beleive that I almost forgot to mention it...

SQL Server 2005 has a system trace running by default.  Recently Jack Corbett showed me a way (here) to not only find where these default trace files are, but also how to query & retrieve them into the running system!  For example, here is Jacks code that demonstrates how to find Logon events that used a Windows Group to Logon:

SELECT DISTINCT
  
I.NTUserName,
  
I.loginname,
  
I.SessionLoginName,
  
I.databasename,
  
S.*
FROM
  
sys.traces T CROSS Apply
  
::fn_trace_gettable(T.path, 5) I LEFT JOIN
  
sys.syslogins S ON
      
CONVERT(VARBINARY(MAX), I.loginsid) = S.sid    
WHERE
  
S.sid IS NULL

Obviously, this gets reset after every reboot.  Is this seriously cool or what?!?

The answer to this question seems obvious at first: “Managing Performance to acceptable levels.” And indeed that is true.  However, how that is done is the real key, and the answer to that is different based on the context. 

In essence the practice of Performance Management can be broken down in the following ways:

PERFORMANCE MANAGEMENT:
1.       Performance Goal Management
a.       Tracking
b.      Monitoring
c.       Reporting
d.      General Goaling
e.      Service-Levels
f.       
2.       Performance Problem Management
a.       Problem Investigation
b.      Troubleshooting
c.      
3.       Performance Planning
a.       Server Sizing
b.      Application Sizing
c.       Capacity Planning
d.      Service Growth Planning
e.      ...
4.       Application Performance
a.       Application Performance Management
b.      Software Performance Engineering
c.      

This outline is not complete, and I will be filling it in over the next few months, as well as explaining the individual items.

Several months ago a poster at SqlServerCentral.com posed a problem that is all too common in SQL Server SAN environments:

Hi,

I have an application in use at a customer site that uses SQL Server 2000 and the customer is complaining about slow performance. The normal issues I receive is that a particular part of the application is running slow. So I request a profiler trace and identify the slow performing query and resolve the problem. This customer says its all slow. I have requested Profiler Traces and examined them and at times I can see queries that normally run in less than 1 second taking 20-30 seconds to complete.

I have also asked the customer for perfmon logs and examined these. The logs show that the database disk (not the log disk) has sometimes queues of 100's of I/O's. The Avg. Disk Sec/Reads show response times for the disk varying from less than 20ms upto 2 seconds.

I believe the issue is with the poor I/O performance, but the customer says its the application.

can anyone advise on how to proceed with an issue like this.

Ronnie

I gave the following response:

Although there are a huge number of possible reasons for this behavior, in my experience it is almost always due to one of two probable influences:

1) Irregularities in the optimizer/statistics facilities.  SQL Server 2000 had a number of issues in this area which you can research at Microsoft, however, it is usually easy enough to tell if this may be the cause.  Compare the query plan of 20 sec responses to the 2 sec instances.  If they are significantly different then that is your likely cause.  The easiest Quick & Dirty short-term fix is to add query hints to force them to choose the plan you know they should use.

However, the fact that CPU & Read IO's are the same is an indication that this is probably not what is happening.  Which leaves you with...

2) Contention.  Contention is a broad category that can take many forms, both obvious and subtle.  The high queue size on your data disk makes it likely that you are experiencing physical contention for that disk (and not contention for other resources, such as CPU and not logical application contention such as Locking).

At this point, the possible sources of contention break down as follows:

A) Application Internal:  In other words, you have too many users trying to do the same thing at the same time.  That disk has become the bottleneck that everyone is backing up behind.  You need to either put less demand on that disk or give that disk more IO capacity (ie., optimize the App, or the queries/sProcs, tune the tables/indexes, move the tables to less used disks, OR reconfigure the disk (see below)).

B) SQL Server internal: other usages of the SQL server are putting load on that disk that are contending with the app's usage of it.  You need to eliminate, reduce or redirect that other SQL usage.

C) Server Internal: non-SQL usages of that disk, but from the same server, are competing for that disk.  You need to eliminate, reduce or redirect that other usage.

D) SAN-concealed: SAN's are commonly carved up into separate Logical volumes that are nonetheless competing for the same physical resources.  So for instance, your Log disk and your Data disk may actually be sharing the same physical devices (this is a BAD thing, but much more common than you might think).  More likely, some completely unrelated server(s) have heavily-used logical volumes that are using the same physical drives as your data disk.  Solution: reconfigure your data disk on the SAN.

More at www.sqlservercentral.com/Forums/Topic466275-65-1.aspx

 

Copyright 2008 by R. Barry Young
 RBarryYoung.net  |  Terms Of Use  |  Privacy Statement