Home » excel » sql server – Interquartile Range – Lower, Upper and Median

sql server – Interquartile Range – Lower, Upper and Median

Posted by: admin March 9, 2020 Leave a comment

Questions:

I’m trying to work out the interquartile range based on an array of numbers which can be any length e.g.

1,  1,  5,  6,  7,  8,  2,  4,  7,  9,  9,  9,  9

The values that I need to work out from this interquartile range are:

  • Upper Quartile
  • Median
  • Lower Quartile

If I dump the above array of numbers into Microsoft Excel (columns A:M), then I can use the following formulas:

  • =QUARTILE.INC(A1:M1,1)
  • =QUARTILE.INC(A1:M1,2)
  • =QUARTILE.INC(A1:M1,3)

To get my answers of:

  • 4
  • 7
  • 9

I now need to work out these 3 values in either SQL Server or VB.NET. I can get the array values in any format or object in either of these languages, but I can’t find any functions that exist like the QUARTILE.INC function that Excel has.

Does anyone know how this could be achieved in either SQL Server or VB.NET?

How to&Answers:

There might be an easier way, but to get Quartiles, you can use NTILE (Transact-SQL)

Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

So for your data:

SELECT  1 Val
INTO    #temp
UNION ALL
SELECT  1
UNION ALL
SELECT  5
UNION ALL
SELECT  6
UNION ALL
SELECT  7
UNION ALL
SELECT  8
UNION ALL
SELECT  2
UNION ALL
SELECT  4
UNION ALL
SELECT  7
UNION ALL
SELECT  9
UNION ALL
SELECT  9
UNION ALL
SELECT  9
UNION ALL
SELECT  9

-- NTILE(4) specifies you require 4 partitions (quartiles)
SELECT  NTILE(4) OVER ( ORDER BY Val ) AS Quartile ,
        Val
INTO #tempQuartiles
FROM    #temp

SELECT * 
FROM #tempQuartiles

DROP TABLE #temp
DROP TABLE #tempQuartiles

This would produce:

Quartile    Val
1           1
1           1
1           2
1           4
2           5
2           6
2           7
3           7
3           8
3           9
4           9
4           9
4           9

From this you can work out what you’re after.

So modifying the SELECT you can do this:

SELECT Quartile, MAX(Val) MaxVal
FROM #tempQuartiles
WHERE Quartile <= 3
GROUP BY Quartile

To produce:

Quartile    MaxVal
1           4
2           7
3           9

Answer:

We have created a User-Defined-Type to use it as a function parameter and afterwards used it this way.

Our implementation uses the same calculations as the Excel Percentile function.

CREATE TYPE [dbo].[floatListType] AS TABLE (
    [value] FLOAT NOT NULL
);

GO

CREATE FUNCTION [dbo].[getPercentile]
(
    @data floatListType readonly,
    @percentile float
)
RETURNS float
AS
BEGIN
    declare @values table
    (
        value float,
        idx   int
    );

    insert into @values
    select value, ROW_NUMBER() OVER (order by value) - 1 as idx
    from @data;

    declare @cnt int = (select count(*) from @values)
        , @n float = (@cnt - 1) * @percentile + 1
        , @k int = FLOOR(@n)
        , @d float = @n - @k;

    if (@k = 0)
        return (select value from @values where idx = 0)
    if (@k = @cnt)
        return (select value from @values where idx = @cnt - 1)
    if (@k > 0 AND @k < @cnt)
        return (select value from @values where idx = @k - 1)
            + @d * ((select value from @values where idx = @k)
            - (select value from @values where idx = @k - 1))

    return null;
END

You can use it like this to get the median and the quartiles (as Q1 is simply a 0.25 percentile) for example:

declare @values floatListType;

insert into @values
select value from #mytable

select getPercentile(@values, 0.25) as Q1,
    getPercentile(@values, 0.5) as median,
    getPercentile(@values, 0.75) as Q3

Answer:

Apologies if I misunderstood you, but this could be done using NTILE() and later on ROW_NUMBER()

SQL Code:

;WITH FirstStep (NT, N)
AS (
    SELECT NTILE(3) OVER (ORDER BY T.column1), T.column1
    FROM dbo.GetTableFromList_Int('1,  1,  5,  6,  7,  8,  2,  4,  7,  9,  9,  9,  9', ',') AS T
),
SecondStep (RN, NT, N)
AS (
    SELECT ROW_NUMBER() OVER (PARTITION BY T.NT ORDER BY T.N DESC), NT, T.N
    FROM FirstStep AS T
)
SELECT N
FROM SecondStep
WHERE RN = 1

Explanation:

  • dbo.GetTableFromList_Int() TVF splits my string into rows (DISTINCT ones)
  • We use NTILE(3) to split this into three categories, ordered by your list (IIRC you need to order your list to get correct values)
  • Then use ROW_NUMBER() to get correct value in each of the group.

In your scenario it returns expected results.

If this is not what you needed, then it can be modified to get correct output.

Answer:

If you want a SQL Server solution, a couple of years ago I posted an Interquartile Range procedure on my blog. It’s based on dynamic SQL, so you can plug any columns you have access to into it. It’s not well-tested, I was still learning the ropes back then and the code is a little old now, but it could meet your needs out-of-the-box, or at least provide a starting point to code your own solution. Here’s the gist of the code – follow the link to my blog for an in-depth discussion.

CREATE PROCEDURE [Calculations].[InterquartileRangeSP]
@DatabaseName as nvarchar(128) = NULL, @SchemaName as nvarchar(128), @TableName as nvarchar(128),@ColumnName AS nvarchar(128), @PrimaryKeyName as nvarchar(400), @OrderByCode as tinyint = 1, @DecimalPrecision AS nvarchar(50)
AS
SET @DatabaseName = @DatabaseName + ‘.’
DECLARE @SchemaAndTableName nvarchar(400)
SET @SchemaAndTableName = ISNull(@DatabaseName, ”) + @SchemaName + ‘.’ + @TableName
DECLARE @SQLString nvarchar(max)

SET @SQLString = ‘DECLARE @OrderByCode tinyint,
@Count bigint,
@LowerPoint bigint,
@UpperPoint bigint,
@LowerRemainder decimal(38,37), — use the maximum precision and scale for these two variables to make the
 procedure flexible enough to handle large datasets; I suppose I could use a float
@UpperRemainder decimal(38,37),
@LowerQuartile decimal(‘ + @DecimalPrecision + ‘),
@UpperQuartile decimal(‘ + @DecimalPrecision + ‘),
@InterquartileRange decimal(‘ + @DecimalPrecision + ‘),
@LowerInnerFence decimal(‘ + @DecimalPrecision + ‘),
@UpperInnerFence decimal(‘ + @DecimalPrecision + ‘),
@LowerOuterFence decimal(‘ + @DecimalPrecision + ‘),
@UpperOuterFence decimal(‘ + @DecimalPrecision + ‘) 

SET @OrderByCode = ‘ + CAST(@OrderByCode AS nvarchar(50)) + ‘ SELECT @Count=Count(‘ + @ColumnName + ‘)
FROM ‘ + @SchemaAndTableName +
‘ WHERE ‘ + @ColumnName + ‘ IS NOT NULL

SELECT @LowerPoint = (@Count + 1) / 4, @LowerRemainder =  ((CAST(@Count AS decimal(‘ + @DecimalPrecision + ‘)) + 1) % 4) /4,
@UpperPoint = ((@Count + 1) *3) / 4, @UpperRemainder =  (((CAST(@Count AS decimal(‘ + @DecimalPrecision + ‘)) + 1) *3) % 4) / 4; –multiply by 3 for the left s’ + @PrimaryKeyName + ‘e on the upper point to get 75 percent

WITH TempCTE
(‘ + @PrimaryKeyName + ‘, RN, ‘ + @ColumnName + ‘)
AS (SELECT ‘ + @PrimaryKeyName + ‘, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY ‘ + @ColumnName + ‘ ASC) AS RN, ‘ + @ColumnName + ‘
FROM ‘ + @SchemaAndTableName + ‘
WHERE ‘ + @ColumnName + ‘ IS NOT NULL),
TempCTE2 (QuartileValue)
AS (SELECT TOP 1 ‘ + @ColumnName + ‘ + ((Lead(‘ + @ColumnName + ‘, 1) OVER (ORDER BY ‘ + @ColumnName + ‘) – ‘ + @ColumnName + ‘) * @LowerRemainder) AS QuartileValue
FROM TempCTE
WHERE RN BETWEEN @LowerPoint AND @LowerPoint + 1 

UNION


SELECT TOP 1 ‘ + @ColumnName + ‘ + ((Lead(‘ + @ColumnName + ‘, 1) OVER (ORDER BY ‘ + @ColumnName + ‘) – ‘ + @ColumnName + ‘) * @UpperRemainder) AS QuartileValue
FROM TempCTE
WHERE RN BETWEEN @UpperPoint AND @UpperPoint + 1)

SELECT @LowerQuartile = (SELECT TOP 1 QuartileValue
 FROM TempCTE2 ORDER BY QuartileValue ASC), @UpperQuartile = (SELECT TOP 1 QuartileValue
 FROM TempCTE2 ORDER BY QuartileValue DESC)

SELECT @InterquartileRange = @UpperQuartile – @LowerQuartile
SELECT @LowerInnerFence = @LowerQuartile – (1.5 * @InterquartileRange), @UpperInnerFence = @UpperQuartile + (1.5 * @InterquartileRange), @LowerOuterFence = @LowerQuartile – (3 * @InterquartileRange), @UpperOuterFence = @UpperQuartile + (3 * @InterquartileRange)

–SELECT @LowerPoint AS LowerPoint, @LowerRemainder AS LowerRemainder, @UpperPoint AS UpperPoint, @UpperRemainder AS UpperRemainder
— uncomment this line to debug the inner calculations

SELECT @LowerQuartile AS LowerQuartile, @UpperQuartile AS UpperQuartile, @InterquartileRange AS InterQuartileRange,@LowerInnerFence AS LowerInnerFence, @UpperInnerFence AS UpperInnerFence,@LowerOuterFence AS LowerOuterFence, @UpperOuterFence AS UpperOuterFence


SELECT ‘ + @PrimaryKeyName + ‘, ‘ + @ColumnName + ‘, OutlierDegree
FROM  (SELECT ‘ + @PrimaryKeyName + ‘, ‘ + @ColumnName + ‘,
       ”OutlierDegree” =  CASE WHEN (‘ + @ColumnName + ‘ < @LowerInnerFence AND ‘ + @ColumnName + ‘ >= @LowerOuterFence) OR (‘ +
@ColumnName + ‘ > @UpperInnerFence
 AND ‘ + @ColumnName + ‘ <= @UpperOuterFence) THEN 1
       WHEN ‘ + @ColumnName + ‘ < @LowerOuterFence OR ‘ + @ColumnName + ‘ > @UpperOuterFence THEN 2
       ELSE 0 END
       FROM ‘ + @SchemaAndTableName + ‘
       WHERE ‘ + @ColumnName + ‘ IS NOT NULL) AS T1
      ORDER BY CASE WHEN @OrderByCode = 1 THEN ‘ + @PrimaryKeyName + ‘ END ASC,
CASE WHEN @OrderByCode = 2 THEN ‘ + @PrimaryKeyName + ‘ END DESC,
CASE WHEN @OrderByCode = 3 THEN ‘ + @ColumnName + ‘ END ASC,
CASE WHEN @OrderByCode = 4 THEN ‘ + @ColumnName + ‘ END DESC,
CASE WHEN @OrderByCode = 5 THEN OutlierDegree END ASC,
CASE WHEN @OrderByCode = 6 THEN OutlierDegree END DESC‘

–SELECT @SQLString — uncomment this to debug string errors
EXEC (@SQLString)

<a href="https://exceptionshub.com/category/sql-server”>More Answers>