This month's TSQL Tuesday is kindly hosted by Steve Jones, a legend in the SQL community who I have read and learnt many things over many years. Steve is asking about our use cases/experiences for windows functions, as they have been around for a long time now. So long in fact, that I actually said "Really?" at my screen when reading the original invite... I guess i'm just old.
I often find myself using windows functions in order to group data in wierd and wonderful ways that a simple GROUP BY can't do... however, the example below is one I came across quite recently on my travels, I was asked to have a look at some code for optimization opportunities. And in this case, thankfully there were some.
The original code looked something like this. Oh look, a cursor. Happy Happy Joy Joy.
Data Example
Given some data like this....
The analyst wanted deltas for each period, so they did something similar to the below...
Cursing a Cursor
DECLARE DataMovements CURSOR FOR
select [GroupOrder], [Account], [Customer], [TaxYear], [AccountingPeriod], [Incurred_Position], [OverRider_Position] from Base_Data
order by [Account], [Customer], [TaxYear], [AccountingPeriod]
OPEN DataMovements
FETCH NEXT FROM DataMovements
INTO @GroupOrder, @Account, @Customer, @TaxYear, @AccountingPeriod, @INC_Position, @OVR_Position
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN TRY
IF @GroupOrder = 0
BEGIN
SET @Inc_Movement =@Inc_Position
SET @OVR_Movement =@OVR_Position
END
ELSE
BEGIN
SET @Inc_Movement=@Inc_Position-@LastIncPosition
SET @OVR_Movement=@OVR_Position-@LastOVRPosition
END
SET @LastIncPosition=@Inc_Position
SET @LastOVRPosition=@OVR_Position
insert into [Results].[Final_Movements] ([GroupOrder], [AccountPeriod], [Account], [Customer], [TaxYear], [Inc_Position], [Inc_Movement], [OVR_Position], [OVR_Movement])
VALUES
(@GroupOrder, @AccountingPeriod, @Account, @Customer, @TaxYear, @Inc_Position, @Inc_Movement, @OVR_Position, @OVR_Movement)
FETCH NEXT FROM DataMovements INTO @Order, @Account, @Customer, @TaxYear, @AccountingPeriod, @INC_Position, @OVR_Position
END;
CLOSE DataMovements
DEALLOCATE DataMovements
So, what was this chunk of code doing? I've amended it heavily for this post, but essentially:
Looping
If the Order was 0, then it was the first entry in that group, so the position value was to be used as the Delta (movement).
If not, subtracting the previous run (position) from the current run to obtain the Delta (movement) value.
Inserting this data into a final table.
Hmmm, simple enough, but that cursor looks suspicious... hang on, haven't we got a function look back at rows for us? LAG.
(Looking back at the original code in its entirity now after many months, just spotted that the GroupOrder column data could instead be defined using another Window Function ROW_NUMBER() instead. These window functions are the gift that just keeps giving.)
Howay the LAGs
Looking back and one quick rewrite later.
INSERT INTO [Results].[Final_Movements]
([GroupOrder], [AccountPeriod], [Account], [Customer], [TaxYear], [Inc_Position], [Inc_Movement], [OVR_Position], [OVR_Movement])
SELECT [GroupOrder], [AccountingPeriod], [Account], [Customer], [TaxYear], [Incurred_Position],
CASE WHEN [GroupOrder]=0 THEN [Incurred_Position]
ELSE [Incurred_Position]-LAG([Incurred_Position], 1) OVER(PARTITION BY [Account],[Customer],[TaxYear] ORDER BY [Account],[AccountingPeriod], [Customer], [TaxYear])
END AS [Inc_Movement],
Overrider_Position,
CASE WHEN [GroupOrder]=0 THEN [OverRider_Position]
ELSE [OverRider_Position]-LAG([OverRider_Position], 1) OVER(PARTITION BY [Account],[Customer],[TaxYear] ORDER BY [Account],[AccountingPeriod], [Customer], [TaxYear])
END AS [OVR_Movement]
FROM Base_Data
Because the order of the data was originally grouped and sorted to define the order, this still needed to happen to make sure the same rows were compared against each other as previously. In order to do this, another window function to partition the data into the groups and row order required was added. I'm using an offset of 1, as I just wanted the previous row value for that group, allowing the quick calculation for the Delta.
Quick display of the data, we see in row 1 and 2, Incurred_position goes from 1200, to 1400, so Inc movement for row 2 is 200. Inc_movement for row 1 is set at the same as Incurred_position as per our logic. You can add a default value in here directly into the LAG statement if you so wish.
Performance Gains?
End result? a 30x reduction in runtime and the death of a cursor. Splendid.
I've used a similar technique lots now, especially when capturing performance data that is cumulative. (Our beloved DMVs - sys.dm_os_waitstats anyone?). A quick use of LAG and its stablemate LEAD means timestamp ordered data can be VERY easily carved up to give us meaningful insights.
New versions even have options to handle NULL values differently as well, i've not had a situation to make use of that functionality yet, but it will be one less headache to worry about in future i'm sure.
Thanks for reading.
Windows Rod 3.1
good article. respect for the header "Howay the LAGs", probably lost on many of your readers :)