Translation

The oldest posts, are written in Italian. If you are interested and you want read the post in English, please use Google Translator. You can find it on the right side. If the translation is wrong, please email me: I'll try to translate for you.

mercoledì, luglio 21, 2021

Analytic Functions preserve the "PARTITION BY" clause (Part 6): LAST_VALUE explained

<< Part05

LAST_VALUE is an analytic function. Here there is the documentation.

"LAST_VALUE is an analytic function that is useful for data densification. It returns the last value in an ordered set of values. 

If the last value in the set is null, then the function returns NULL unless you specify IGNORE NULLS. If you specify IGNORE NULLS, then LAST_VALUE returns the last non-null value in the set, or NULL if all values are null.

The default is RESPECT NULLS."

The first thing, is to show how LAST_VALUE works using RESPECT NULLS (for simplicity I'm going to use only 3 columns)

with
-----------
fix_1_ as (
-----------
SELECT
    elaboration_day,
    machine_id,
    lag(location_type) over (partition by machine_id order by elaboration_day) next_r,
    case
        when lag(location_type) over (partition by machine_id order by elaboration_day) is null
        then elaboration_day
        when lag(location_type) over (partition by machine_id order by elaboration_day) != location_type
        then elaboration_day
        else null
    end fix,
    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy'))
-------------
, fix_2_ as (
-------------
SELECT
    elaboration_day,
    LAST_VALUE (fix) RESPECT NULLS
            OVER (PARTITION BY machine_id ORDER BY elaboration_day
                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) fix
,
    location_type
FROM
    fix_1_)
----------------------
select
    *
 from fix_2_
 order by elaboration_day;

 

By definition from the documentation


Because I partitioned by "machine_id", in this simple case, the set are all rows from 01//Jan to 15/Jan.

The first value NOT NULL is the 01/Jan/20. The second value NOT NULL is the 04/Jan/20. The 3th, the 10/Jan/20 and the 4th, 13/Jan/20.

All others rows are NULLS.

Using the IGNORE NULLS option (always by definition):

with
-----------
fix_1_ as (
-----------
SELECT
    elaboration_day,
    machine_id,
    lag(location_type) over (partition by machine_id order by elaboration_day) next_r,
    case
        when lag(location_type) over (partition by machine_id order by elaboration_day) is null
        then elaboration_day
        when lag(location_type) over (partition by machine_id order by elaboration_day) != location_type
        then elaboration_day
        else null
    end fix,
    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy'))
-------------
, fix_2_ as (
-------------
SELECT
    elaboration_day,
    LAST_VALUE (fix) IGNORE NULLS
            OVER (PARTITION BY machine_id ORDER BY elaboration_day
                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) fix,
    location_type
FROM
    fix_1_)
----------------------
select
    *
 from fix_2_
 order by elaboration_day;

 

The key here is the "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" option. In fact

  • On 2/Jan/20 the not null LAST_VALUE from 01/Jan (UNBOUNDED PRECEDING) to 02/Jan  (CURRENT ROW), is 01/Jan/20. On 03/Jan, the LAST_VALUE NOT NULL from 01/Jan (UNBOUNDED PRECEDING) to 03/Jan (CURRENT ROW), is 01/Jan/20.
  • On 04/Jan, the not null LAST_VALUE from 01/Jan (UNBOUNDED PRECEDING) to 04/Jan  (CURRENT ROW), is 04/Jan/20. On 05/Jan, the LAST_VALUE NOT NULL from 01/Jan (UNBOUNDED PRECEDING) to 05/Jan (CURRENT ROW), is 04/Jan/20.
  • On 10/Jan, the not null LAST_VALUE from 01/Jan (UNBOUNDED PRECEDING) to 10/Jan  (CURRENT ROW), is 10/Jan/20. On 11/Jan, the LAST_VALUE NOT NULL from 01/Jan (UNBOUNDED PRECEDING) to 11/Jan (CURRENT ROW), is 10/Jan/20.
  •  On 13/Jan, the not null LAST_VALUE from 01/Jan (UNBOUNDED PRECEDING) to 13/Jan  (CURRENT ROW), is 13/Jan/20. On 15/Jan, the LAST_VALUE NOT NULL from 01/Jan (UNBOUNDED PRECEDING) to 15/Jan (CURRENT ROW), is 13/Jan/20
<< Part05

Nessun commento: