A pseudocursor is a type of external implicit cursor which means that it is an externally visible way of serializing a dataset by taking advantage of implicit behavior in Transact-SQL. Specifically, a pseudocursor leverages the ability of SELECT and UPDATE to assign to variables on a per-output row basis. The following is an example of a simple pseudocursor that counts the number of rows in a Cross Join (approximately 350,000):
Declare @count int
Set @count = 0
Select @count = @count + 1
From sys.columns c1, sys.columns c2
Print @count
The Select statement simply adds one to the counter for each output record that it would have produced. Note that the SELECT statement cannot both produce output and assign to a variable in the same statement.
A pseudocursor runs very fast compared to a true Cursor, but not nearly as fast as the best pure Set-based T-SQL, when that is possible. For comparison, I tested this routine on my against two other functionally identical procedures. First a true Cursor routine:
Declare @dummy int
Declare @count int
Set @count = 0
Declare CountCursor Cursor LOCAL FORWARD_ONLY For
Select 1
From sys.columns c1, sys.columns c2
OPEN CountCursor
FETCH NEXT FROM CountCursor Into @Dummy
WHILE @@Fetch_Status = 0
Begin
Set @count = @count + 1
FETCH NEXT FROM CountCursor Into @Dummy
End
Print @count
Secondly, a pure Set-based T-SQL using the built-in Count() aggregate function:
Declare @count int
Select @count=count(*) From sys.columns c1, sys.columns c2
Print @count
The results were as follows:
|
Method
|
Seconds
|
|
Pseudocursor
|
0.121
|
|
True Cursor
|
30.823
|
|
Pure Set-based
|
0.004
|
What this shows is that when there are functionally identical T-SQL scripts of the same apparent operational Order, then Pseudocursors will typically beat explicit Cursors, but they will also typically lose to pure Set-based SQL. A look at the number of lines and obvious complexity differences between these three approaches reveals their other two characteristic differences: expressive Power and code Clarity, which follow the same precedence as performance: pure Set-based, then Pseudocursors and finally true Cursors.
What all of this presumes, however, is that these equivalent algorithms for these three approaches exist and this is not always so. This is because of the final attribute or comparison: scope of Applicability. Because Cursors can incorporate the attributes of unbound loops, they gain full Turing-completeness which means that they can be successfully applied to any programming problem that can be solved algorithmically. Set-based T-SQL, on the other hand is implicitly bound by its source tables and is thus unable to applied to certain algorithms. Theoretically, this only applies to problems whose halting is not predictable or certain; however, in practice it can become unusable well before that. For instance, a Hailstone Function generator would be either impractical or impossible to implement in pure Set-based SQL (depending on the rules imposed).
As with performance, power and clarity, Pseudocursors again occupy a very useful middle ground between these two. Although Pseudocursors are also not Turing-complete, they can implement a very large set of algorithms, beyond what is possible with pure Set-based SQL alone, though still within certain input bounds. For instance, you could very easily implement a Hailstone Function generator with Pseudocursors, and although technically it would still be bound by its source tables, it could easily be made to run for longer than any of us will be alive.
More to come in Part 2 …