Translation

The oldest posts, are written in Italian. If you are interested and you want read the post in English, please use Google Translator. You can find it on the right side. If the translation is wrong, please email me: I'll try to translate for you.

mercoledì, luglio 21, 2021

Analytic Functions preserve the "PARTITION BY" clause (Part 6): LAST_VALUE explained

<< Part05

LAST_VALUE is an analytic function. Here there is the documentation.

"LAST_VALUE is an analytic function that is useful for data densification. It returns the last value in an ordered set of values. 

If the last value in the set is null, then the function returns NULL unless you specify IGNORE NULLS. If you specify IGNORE NULLS, then LAST_VALUE returns the last non-null value in the set, or NULL if all values are null.

The default is RESPECT NULLS."

The first thing, is to show how LAST_VALUE works using RESPECT NULLS (for simplicity I'm going to use only 3 columns)

with
-----------
fix_1_ as (
-----------
SELECT
    elaboration_day,
    machine_id,
    lag(location_type) over (partition by machine_id order by elaboration_day) next_r,
    case
        when lag(location_type) over (partition by machine_id order by elaboration_day) is null
        then elaboration_day
        when lag(location_type) over (partition by machine_id order by elaboration_day) != location_type
        then elaboration_day
        else null
    end fix,
    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy'))
-------------
, fix_2_ as (
-------------
SELECT
    elaboration_day,
    LAST_VALUE (fix) RESPECT NULLS
            OVER (PARTITION BY machine_id ORDER BY elaboration_day
                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) fix
,
    location_type
FROM
    fix_1_)
----------------------
select
    *
 from fix_2_
 order by elaboration_day;

 

By definition from the documentation


Because I partitioned by "machine_id", in this simple case, the set are all rows from 01//Jan to 15/Jan.

The first value NOT NULL is the 01/Jan/20. The second value NOT NULL is the 04/Jan/20. The 3th, the 10/Jan/20 and the 4th, 13/Jan/20.

All others rows are NULLS.

Using the IGNORE NULLS option (always by definition):

with
-----------
fix_1_ as (
-----------
SELECT
    elaboration_day,
    machine_id,
    lag(location_type) over (partition by machine_id order by elaboration_day) next_r,
    case
        when lag(location_type) over (partition by machine_id order by elaboration_day) is null
        then elaboration_day
        when lag(location_type) over (partition by machine_id order by elaboration_day) != location_type
        then elaboration_day
        else null
    end fix,
    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy'))
-------------
, fix_2_ as (
-------------
SELECT
    elaboration_day,
    LAST_VALUE (fix) IGNORE NULLS
            OVER (PARTITION BY machine_id ORDER BY elaboration_day
                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) fix,
    location_type
FROM
    fix_1_)
----------------------
select
    *
 from fix_2_
 order by elaboration_day;

 

The key here is the "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" option. In fact

  • On 2/Jan/20 the not null LAST_VALUE from 01/Jan (UNBOUNDED PRECEDING) to 02/Jan  (CURRENT ROW), is 01/Jan/20. On 03/Jan, the LAST_VALUE NOT NULL from 01/Jan (UNBOUNDED PRECEDING) to 03/Jan (CURRENT ROW), is 01/Jan/20.
  • On 04/Jan, the not null LAST_VALUE from 01/Jan (UNBOUNDED PRECEDING) to 04/Jan  (CURRENT ROW), is 04/Jan/20. On 05/Jan, the LAST_VALUE NOT NULL from 01/Jan (UNBOUNDED PRECEDING) to 05/Jan (CURRENT ROW), is 04/Jan/20.
  • On 10/Jan, the not null LAST_VALUE from 01/Jan (UNBOUNDED PRECEDING) to 10/Jan  (CURRENT ROW), is 10/Jan/20. On 11/Jan, the LAST_VALUE NOT NULL from 01/Jan (UNBOUNDED PRECEDING) to 11/Jan (CURRENT ROW), is 10/Jan/20.
  •  On 13/Jan, the not null LAST_VALUE from 01/Jan (UNBOUNDED PRECEDING) to 13/Jan  (CURRENT ROW), is 13/Jan/20. On 15/Jan, the LAST_VALUE NOT NULL from 01/Jan (UNBOUNDED PRECEDING) to 15/Jan (CURRENT ROW), is 13/Jan/20
<< Part05

martedì, luglio 20, 2021

Analytic Functions preserve the "PARTITION BY" clause (Part 5)

<< Part04     Part06 >>

Now, using the LAST_VALUE analytic function, I can "fill the blank"

with
-----------
fix_1_ as (
-----------
SELECT
    elaboration_day,
    machine_id,
    lag(location_type) over (partition by machine_id order by elaboration_day) next_r,
    case
        when lag(location_type) over (partition by machine_id order by elaboration_day) is null
        then elaboration_day
        when lag(location_type) over (partition by machine_id order by elaboration_day) != location_type
        then elaboration_day
        else null
    end fix,
    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy'))
-------------
, fix_2_ as (
-------------
SELECT
    elaboration_day,
    machine_id,
    LAST_VALUE (fix) IGNORE NULLS
            OVER (PARTITION BY machine_id ORDER BY elaboration_day
                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) fix,
    location_type,
    commercial_id,
    warehouse_id
FROM
    fix_1_)
select * from fix_2_
    order by elaboration_day;

 

 

I'll explain why the LAST_VALUE "fill the blank" in a different post.

At this point I can separate the 4 interval. To show this I use again COUNT and ROW_NUMBER analytic function

with
-----------
fix_1_ as (
-----------
SELECT
    elaboration_day,
    machine_id,
    lag(location_type) over (partition by machine_id order by elaboration_day) next_r,
    case
        when lag(location_type) over (partition by machine_id order by elaboration_day) is null
        then elaboration_day
        when lag(location_type) over (partition by machine_id order by elaboration_day) != location_type
        then elaboration_day
        else null
    end fix,
    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy'))
-------------
, fix_2_ as (
-------------
SELECT
    elaboration_day,
    machine_id,
    LAST_VALUE (fix) IGNORE NULLS
            OVER (PARTITION BY machine_id ORDER BY elaboration_day
                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) fix,

    location_type,
    commercial_id,
    warehouse_id
FROM
    fix_1_)
----------------------
select
    elaboration_day,
    machine_id,
    count(location_type) over (partition by machine_id, location_type, fix) nrd,
    row_number() over (partition by machine_id, location_type, fix order by elaboration_day) rn,

    location_type,
    commercial_id,
    warehouse_id
from fix_2_
    order by elaboration_day;


 

How you can see, now COUNT function (NRD column) count only rows in it a local range and not over all. The same is true for ROW_NUMBER function (NR column) that restart each time LOCATION_TYPE change.

<< Part04     Part06 >>

 

Analytic Functions preserve the "PARTITION BY" clause (Part 4)

<< Part03     Part05>>

Now the question is: is it exists a way to not preserve the partitioning?

There are a lot of solutions to this question. In the next posts I'm going to show you the one I used.

First of all, I have to evaluate the previous row: if the location change, then there is a "point of change" as in the following picture

SELECT
    elaboration_day,
    lag(location_type) over (partition by machine_id order by elaboration_day) next_r,
    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy')
    order by elaboration_day;

 


 So, there are 2 different conditions to evaluate the "point of change":

  1. WHEN next "location_type" is NULL (true only for the first row)
  2. WHEN next "location_type" != actual "location_type"

 IF one of these 2 conditions are true THEN I use the elaboration_day, ELSE NULL

 SELECT
    elaboration_day,
    lag(location_type) over (partition by machine_id order by elaboration_day) next_r,
    case
        when lag(location_type) over (partition by machine_id order by elaboration_day) is null
        then elaboration_day
        when lag(location_type) over (partition by machine_id order by elaboration_day) != location_type
        then elaboration_day
        else null
    end fix,

    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy')
    order by elaboration_day;

 

<< Part03     Part05 >>

lunedì, luglio 19, 2021

Analytic Functions preserve the "PARTITION BY" clause (Part 3)

<< Part02     Part04 >>

In order to understand why, I use the ROW_NUMBER analytic function

SELECT
    elaboration_day,
    row_number() over (partition by machine_id, location_type order by elaboration_day) rn,
    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy')
    order by elaboration_day;


How you can see, the ROW_NUMBER, preserve the PARTITIONING and numerate in ordered way the days inside the partition.

It could be strange, but it's the (right) way the analytic function works.

<< Part02     Part04 >>

 

Analytic Functions preserve the "PARTITION BY" clause (Part 2)

<< Part01     Part03 >>

Question. I want to know how many continues days the slot-machine was in specific location

Start using minimum columns:

SELECT
    elaboration_day,
    location_type,
    commercial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy')
    order by elaboration_day;


 If I count the days by LOCATION_TYPE:

SELECT
    elaboration_day,
    count(location_type) over (partition by machine_id, location_type) nrd,
    location_type,
    comme+rcial_id,
    warehouse_id
FROM
    awp_machine
    where elaboration_day >= to_date('01/01/2020', 'dd/mm/yyyy')
    order by elaboration_day;

I don't have 3 + 3 + 6 + 3 days. Instead I have 6 + 9 days as shown in the following picture

This is because the COUNT analytic function, count the days where the machine was during the interval I use (15 days in this case: 01/Jan - 15/Jan)

 
 

Analytic Functions preserve the "PARTITION BY" clause (Part 1)

Part02 >>

The Analytic Functions are very powerful and I use them every day. What I learned is that some behaviors could seems strange or bugs, but it's the normal way how to the Analytic Function works.

This is the scenario: a slot-machine has a counter that identify how much money is inserted until now. For example if on 03/Jan at 23:59:59, the counter is 10, then 10€ are inserted into machine until 03/Jan.

On  04/Jan at 23:59:59, the counter is 11. This means that 11€ are inserted into machine until 04/Jan.

On 04/Jan the machine gained 1€ (11 - 10).

In order to know how much money a slot machine has gained in one day I need the counters of the machine, day by day.

A slot machine can be in a Bar or in Warehouse if it needs repair. If it is in the Bar then people can play, otherwise, if the machine is in Warehouse, it isn't used and gain 0 (zero). Anyway, because the machine is "ACTIVE" (potentially it could gain money), by default we assign 560€ as "FLAT" value (it is a convention used by the govern).

So, the CNTTOTIN is the counter IN and represent the money the people insert into machine. At the same way, there is the CNTTOTOT and represent the money the people win (coin OUT from the machine).

When FLAT is used, it means that the machine is not read, then the previous CNTTOTIN is used as counter. In this way, the TAXABLE report the FLAT value, meanwhile the WINNINGS is set to 0 (zero).

 
COLUMN NAME MEANING
ELABORATION_DAY Day of elaboration
MACHINE_ID Machine identifier
CNTTOTIN IN counter
CNTTOTOT OUT counter
COMMERCIAL_ID Location identifier
WAREHOUSE_ID Warehouse identifier
WINNINGS How much money the people wins in a day (2)
TAXABLE How much money the machine gained in a day (flat value if "active" and in warehouse) (1)(2)
LOCATION_TYPE CME for COMMERCIAL LOCATION; WRH for WAREHOUSE
CONTRACT_BASE_ID Contract identifier if the machine is in WRH, NULL if in CME
CONTRACT_COMMERCIAL_ID Contract identifier if the machine is in CME, NULL if in WRH
OPERATION_TYPE "Read" if the cpunter is detected, "Flat" otherwise
(1) The column name reflect the fact that from the value, we don't have yet substracted taxe
(2) The values are in €cent
 
Here, how to create the table and popolate it.
 
CREATE TABLE AWP_MACHINE (
    ELABORATION_DAY          DATE, 
    MACHINE_ID               NUMBER(15),
    CNTTOTOT                 NUMBER(15),
    CNTTOTIN                 NUMBER(15),
    COMMERCIAL_ID            NUMBER(15),
    WAREHOUSE_ID             NUMBER(15),
    WINNINGS                 NUMBER(15),
    TAXABLE                  NUMBER(15),
    LOCATION_TYPE            VARCHAR2(3 BYTE),
    CONTRACT_BASE_ID         NUMBER(15),
    CONTRACT_COMMERCIAL_ID   NUMBER(15),
    OPERATION_TYPE           VARCHAR2(8 BYTE) );


Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('01/01/2020', 'DD/MM/YYYY'), 1205770, 73535500, 55272700, 542703, NULL,
    200, 1000, 'CME', 2656069, 'Read');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('02/01/2020', 'DD/MM/YYYY'), 1205770, 73537900, 55274700, 542703, NULL,
    2000, 2400, 'CME', 2656069, 'Read');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('03/01/2020', 'DD/MM/YYYY'), 1205770, 73547000, 55285000, 542703, NULL,
    10300, 9100, 'CME', 2656069, 'Read');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('04/01/2020', 'DD/MM/YYYY'), 1205770, 73547000, 55285000, NULL, 943382,
    0, 56000, 'WRH', 2669766, 'Flat');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('05/01/2020', 'DD/MM/YYYY'), 1205770, 73547000, 55285000, NULL, 943382,
    0, 56000, 'WRH', 2669766, 'Flat');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('06/01/2020', 'DD/MM/YYYY'), 1205770, 73547000, 55285000, NULL, 943382,
    0, 56000, 'WRH', 2669766, 'Flat');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('07/01/2020', 'DD/MM/YYYY'), 1205770, 73547000, 55285000, NULL, 943382,
    0, 56000, 'WRH', 2669766, 'Flat');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('08/01/2020', 'DD/MM/YYYY'), 1205770, 73547000, 55285000, NULL, 943382,
    0, 56000, 'WRH', 2669766, 'Flat');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('09/01/2020', 'DD/MM/YYYY'), 1205770, 73547000, 55285000, NULL, 943382,
    0, 56000, 'WRH', 2669766, 'Flat');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('10/01/2020', 'DD/MM/YYYY'), 1205770, 73556400, 55288700, 542703, NULL,
    3700, 9400, 'CME', 2656069, 'Read');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('11/01/2020', 'DD/MM/YYYY'), 1205770, 73558800, 55289100, 542703, NULL,
    400, 2400, 'CME', 2656069, 'Read');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('12/01/2020', 'DD/MM/YYYY'), 1205770, 73560200, 55290100, 542703, NULL,
    1000, 1400, 'CME', 2656069, 'Read');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('13/01/2020', 'DD/MM/YYYY'), 1205770, 73560200, 55290100, NULL, 943382,
    0, 56000, 'WRH', 2669766, 'Flat');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('14/01/2020', 'DD/MM/YYYY'), 1205770, 73560200, 55290100, NULL, 943382,
    0, 56000, 'WRH', 2669766, 'Flat');
Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('15/01/2020', 'DD/MM/YYYY'), 1205770, 73560200, 55290100, NULL, 943382,
    0, 56000, 'WRH', 2669766, 'Flat');
COMMIT;

In order to calculate TAXABLE and WINNINGS for the first day, you need an extra insert

Insert into AWP_MACHINE
   (ELABORATION_DAY, MACHINE_ID, CNTTOTIN, CNTTOTOT, COMMERCIAL_ID, WAREHOUSE_ID,
    WINNINGS, TAXABLE, LOCATION_TYPE, CONTRACT_COMMERCIAL_ID, OPERATION_TYPE)
 Values
   (TO_DATE('31/12/2019', 'DD/MM/YYYY'), 1205770, 73534500, 55272500, 542703, NULL,
    NULL, NULL, 'CME', 2656069, 'Read');

Part02 >>