top of page

T-SQL Tuesday #168 – Mature Window Functions




This month's TSQL Tuesday is kindly hosted by Steve Jones, a legend in the SQL community who I have read and learnt many things over many years. Steve is asking about our use cases/experiences for windows functions, as they have been around for a long time now. So long in fact, that I actually said "Really?" at my screen when reading the original invite... I guess i'm just old.


I often find myself using windows functions in order to group data in wierd and wonderful ways that a simple GROUP BY can't do... however, the example below is one I came across quite recently on my travels, I was asked to have a look at some code for optimization opportunities. And in this case, thankfully there were some.


The original code looked something like this. Oh look, a cursor. Happy Happy Joy Joy.


Data Example

Given some data like this....

The analyst wanted deltas for each period, so they did something similar to the below...


Cursing a Cursor

 DECLARE DataMovements CURSOR FOR
 		select [GroupOrder], [Account], [Customer], [TaxYear],  [AccountingPeriod], [Incurred_Position], [OverRider_Position] from Base_Data
 		order by [Account], [Customer], [TaxYear],  [AccountingPeriod]

 OPEN DataMovements
 	FETCH NEXT FROM DataMovements
 	INTO @GroupOrder, @Account, @Customer, @TaxYear,  @AccountingPeriod, @INC_Position, @OVR_Position

 WHILE @@FETCH_STATUS = 0
 	BEGIN
 		BEGIN TRY
 		IF @GroupOrder = 0
 			BEGIN
 				SET @Inc_Movement	=@Inc_Position
 				SET @OVR_Movement	=@OVR_Position
 			END
 		ELSE
 			BEGIN
 				SET @Inc_Movement=@Inc_Position-@LastIncPosition
 				SET @OVR_Movement=@OVR_Position-@LastOVRPosition
 			END
 		SET @LastIncPosition=@Inc_Position
 		SET @LastOVRPosition=@OVR_Position

 insert into [Results].[Final_Movements] ([GroupOrder], [AccountPeriod], [Account], [Customer], [TaxYear],  [Inc_Position], [Inc_Movement], [OVR_Position], [OVR_Movement])

 VALUES
 (@GroupOrder, @AccountingPeriod, @Account, @Customer, @TaxYear,  @Inc_Position, @Inc_Movement, @OVR_Position, @OVR_Movement)

 FETCH NEXT FROM DataMovements INTO @Order, @Account, @Customer, @TaxYear,  @AccountingPeriod, @INC_Position, @OVR_Position

 END;

 CLOSE DataMovements
 DEALLOCATE DataMovements

So, what was this chunk of code doing? I've amended it heavily for this post, but essentially:

  • Looping

  • If the Order was 0, then it was the first entry in that group, so the position value was to be used as the Delta (movement).

  • If not, subtracting the previous run (position) from the current run to obtain the Delta (movement) value.

  • Inserting this data into a final table.

Hmmm, simple enough, but that cursor looks suspicious... hang on, haven't we got a function look back at rows for us? LAG.


(Looking back at the original code in its entirity now after many months, just spotted that the GroupOrder column data could instead be defined using another Window Function ROW_NUMBER() instead. These window functions are the gift that just keeps giving.)


Howay the LAGs

Looking back and one quick rewrite later.

INSERT INTO [Results].[Final_Movements]
([GroupOrder], [AccountPeriod], [Account], [Customer], [TaxYear],  [Inc_Position], [Inc_Movement], [OVR_Position], [OVR_Movement])

SELECT [GroupOrder], [AccountingPeriod], [Account], [Customer], [TaxYear], [Incurred_Position],
CASE WHEN [GroupOrder]=0 THEN [Incurred_Position]
 	ELSE [Incurred_Position]-LAG([Incurred_Position], 1) OVER(PARTITION BY [Account],[Customer],[TaxYear]  ORDER BY [Account],[AccountingPeriod], [Customer], [TaxYear]) 
END AS [Inc_Movement],
Overrider_Position,
CASE WHEN [GroupOrder]=0 THEN [OverRider_Position]
   ELSE [OverRider_Position]-LAG([OverRider_Position], 1) OVER(PARTITION BY [Account],[Customer],[TaxYear]  ORDER BY [Account],[AccountingPeriod], [Customer], [TaxYear]) 
END AS [OVR_Movement]
FROM Base_Data


Because the order of the data was originally grouped and sorted to define the order, this still needed to happen to make sure the same rows were compared against each other as previously. In order to do this, another window function to partition the data into the groups and row order required was added. I'm using an offset of 1, as I just wanted the previous row value for that group, allowing the quick calculation for the Delta.


Quick display of the data, we see in row 1 and 2, Incurred_position goes from 1200, to 1400, so Inc movement for row 2 is 200. Inc_movement for row 1 is set at the same as Incurred_position as per our logic. You can add a default value in here directly into the LAG statement if you so wish.




Performance Gains?

End result? a 30x reduction in runtime and the death of a cursor. Splendid.


I've used a similar technique lots now, especially when capturing performance data that is cumulative. (Our beloved DMVs - sys.dm_os_waitstats anyone?). A quick use of LAG and its stablemate LEAD means timestamp ordered data can be VERY easily carved up to give us meaningful insights.


New versions even have options to handle NULL values differently as well, i've not had a situation to make use of that functionality yet, but it will be one less headache to worry about in future i'm sure.


Thanks for reading.

Windows Rod 3.1

תגובה אחת

דירוג של 0 מתוך 5 כוכבים
אין עדיין דירוגים

הוספת דירוג
אורח
18 בנוב׳ 2023

good article. respect for the header "Howay the LAGs", probably lost on many of your readers :)

נערכה
לייק
bottom of page