March 11, 2010

"For successful technology, reality must take precedence over public relations, for nature cannot be fooled."    --  Richard Feynman
Moving Sql Server Code
Author:RBarryYoungCreated:8/31/2008 8:49 AM
Sql Server Performance

Last night I premiered my presentation "There Must Be 15 Ways To Lose Your Cursors" to the South Jersey division of the Philadelphia SQL Server Users Group.

As promised, I have uploaded the presentation and all of the SQL Scripts as a ZIP it to this site.  You can download it here.

Just a quick note on string concatenation:

 

  Concatenation(appending) a table of many small strings together into one big string is easy in SQL Server, but slow.  Fixng it is hard for the reasons that I list here.  I beleive that I have "solved" this, reducing the O(n2) operation to effectively O(n*Log(n)).

A preliminary version of this technique is demonstrated here.  The approach seen in these forum posts can be greatly improved by "stacking" the 2k roll-ups concatenating all of the lower level strings together at once, instead of "cacading" them up one level at a time, as demonstrated.

I will post a more complete version later...

There are a lot of reasons that I could give for the evilness of Cursors, but all of these specific instances and various details can be summarized in the following four problems:

  1. Performance: they are slooow.

  2. Impact on others: they impact other’s performance too.

  3. Power: Cursors have a low effort to impact ratio.

  4. Clarity: Hard to read, hard to understand.

So if Cursors (and other Loops) are so bad, why do SQL developers still use them? Here are the causes that I have observed:

  1. Ignorance: they don’t realize they’re bad. This we can fix.

  2. Familiarity: OO/3GL paradigm, Loops they know; Sets, not so much.

  3. Design-less Coding: just write a big Loop to process each input row and then insert a line/block for each requirement. The essence of bad programming, mediocre developers have been getting away with this for years.

  4. Applicability/Scope: Loops are easier to apply to more problems. Sure, everything after that is harder, but this way I can get say that I finished my task and leave those problems for another day/person.

More later …


A pseudocursor is a type of external implicit cursor which means that it is an externally visible way of serializing a dataset by taking advantage of implicit behavior in Transact-SQL.  Specifically, a pseudocursor leverages the ability of SELECT and UPDATE to assign to variables on a per-output row basis.  The following is an example of a simple pseudocursor that counts the number of rows in a Cross Join (approximately 350,000):

Declare @count int

Set @count = 0

 

Select @count = @count + 1

 From sys.columns c1, sys.columns c2

 

Print @count

The Select statement simply adds one to the counter for each output record that it would have produced.  Note that the SELECT statement cannot both produce output and assign to a variable in the same statement.

A pseudocursor runs very fast compared to a true Cursor, but not nearly as fast as the best pure Set-based T-SQL, when that is possible.  For comparison, I tested this routine on my against two other functionally identical procedures.  First a true Cursor routine:

Declare @dummy int

Declare @count int

Set @count = 0

 

Declare CountCursor Cursor LOCAL FORWARD_ONLY For

       Select 1

        From sys.columns c1, sys.columns c2

 

OPEN CountCursor

 

FETCH NEXT FROM CountCursor Into @Dummy

WHILE @@Fetch_Status = 0

 Begin

       Set @count = @count + 1

       FETCH NEXT FROM CountCursor Into @Dummy

 End

Print @count

Secondly, a pure Set-based T-SQL using the built-in Count() aggregate function:

Declare @count int

 

Select @count=count(*) From sys.columns c1, sys.columns c2

Print @count

The results were as follows:

Method

Seconds

Pseudocursor

  0.121

True Cursor

30.823

Pure Set-based

  0.004


What this shows is that when there are functionally identical T-SQL scripts of the same apparent operational Order, then Pseudocursors will typically beat explicit Cursors, but they will also typically lose to pure Set-based SQL.  A look at the number of lines and obvious complexity differences between these three approaches reveals their other two characteristic differences: expressive Power and code Clarity, which follow the same precedence as performance: pure Set-based, then Pseudocursors and finally true Cursors.

What all of this presumes, however, is that these equivalent algorithms for these three approaches exist and this is not always so.  This is because of the final attribute or comparison: scope of Applicability.  Because Cursors can incorporate the attributes of unbound loops, they gain full Turing-completeness which means that they can be successfully applied to any programming problem that can be solved algorithmically.  Set-based T-SQL, on the other hand is implicitly bound by its source tables and is thus unable to applied to certain algorithms.  Theoretically, this only applies to problems whose halting is not predictable or certain; however, in practice it can become unusable well before that.  For instance, a Hailstone Function generator would be either impractical or impossible to implement in pure Set-based SQL (depending on the rules imposed).

As with performance, power and clarity, Pseudocursors again occupy a very useful middle ground between these two.  Although Pseudocursors are also not Turing-complete, they can implement a very large set of algorithms, beyond what is possible with pure Set-based SQL alone, though still within certain input bounds.  For instance, you could very easily implement a Hailstone Function generator with Pseudocursors, and although technically it would still be bound by its source tables, it could easily be made to run for longer than any of us will be alive.

More to come in Part 2 …

When discussions about Cursors get hot and heavy, one thing that comes out is that there is a lot of confusions about what a cursor (or "Cursor") really is.  In general a cursor is "a moving placement or pointer that indicates a position."1  More specifically for databases, a cursor can be formally defined as "a control structure for the successive traversal (and potential processing) of records in a result set."2  Less formally, a cursor is any method (structures and processes) that serializes a data set, that is sequences (imposes an order on) it and then processes it one record at a time, in order, keeping track or the current position with the sequence.

Given that definition, cursors are in a lot more places than we might initially assume and these more general cursors are not what we usually mean when we are talking about the Evils of Cursors.  For that reason, I distinguish betweengeneral "cursors" (lower case) and the explicit user written SQL code constructs called "Cursors".  This naturally raises two important questions;

  1. What's so bad about Explicit Cursors?  and,...
  2. Are these other cursors OK then?

I will deal with both of these issues in a future post...

Recently an article and discussion on SqlServerCentral.com turned to the popular opinion that "Cursors are OK somtimes."  Well, I couldn't let that go:

Wow. This obtuse defense of cursors and RBAR has more hand-waving than a Michael Jackson concert. Practically every other post here has statements to the the effect that "sometimes cursors are the best solution", starting with the original article itself, and yet none of these claims has been backed up with an example, or a pointer of a cursor-based SQL procedure that is the best solution for SQL Server in its current release.

Not one.

Hmm, makes me think of one of my own maxims:

When everyone says that something is true, but no one can provide an example, then it probably isn't true.

And let me be clear about what I mean by cursors being bad. Technically, anything in SQL that serializes a data stream and can keep positional context is a cursor. However, when we say "cursors are bad" we do not mean client-side cursors, internal cursors, implicit cursors, etc. We mean explicit Transact-SQL server-side cursors. The kind that people explicitly write in procedures all the time (or try to hide in another stored procedure) and that use the CURSOR datatype.

Since the advent of the new features in SQL Server 2005 (particularly Varchar(MAX) and CTE's), I have not seen one instance where a cursor was the best solution for a problem. The best that can be said for it is that there are some (very few) cases where a cursor was no worse than some other solutions. But real-world problems addressing situations that reasonably belong in the database-tier where cursors are the clear best solution? Not one.

More here: www.sqlservercentral.com/Forums/Topic550718-263-3.aspx

 

Copyright 2008 by R. Barry Young
 RBarryYoung.net  |  Terms Of Use  |  Privacy Statement