Translation

The oldest posts, are written in Italian. If you are interested and you want read the post in English, please use Google Translator. You can find it on the right side. If the translation is wrong, please email me: I'll try to translate for you.

venerdì, giugno 01, 2007

Il Costo è Tempo?

Jonathan Lewis nel suo libro Cost-Based Oracle Fundamentals, nel Cap1 pagina 4 dice:

According to the CPU costing model:
Cost = (
#SRds * sreadtim +
#MRds * mreadtim +
#CPUCycles / cpuspeed
) / sreadtim

where

#SRDs - number of single block reads
#MRDs - number of multi block reads
#CPUCycles - number of CPU Cycles
sreadtim - single block read time
mreadtim - multi block read time
cpuspeed - CPU cycles per second

Translated, this says the following:

The cost is the time spent on single-block reads, plus the time spent on multiblock reads, plus the CPU time required, all divided by the time it takes to do a single-block read. Which means the cost is the total predicted execution time for the statement, expressed in units of the single-block read time.

L’affermazione che il “costo è il tempo” viene ribadito anche sul suo sito, dove aggiunge note al suo libro. La sezione “Cost is time (30th Dec 2005)”, termina infatti con: “Cost is Time – but the units are a bit funny”.

In realtà, come poi Lewis stesso ammette, le cose sono un pò diverse da come sembrano. In un thread del suo Blog infatti, si corregge con la seguente affermazione:

[…] I should have said “resource consumption” rather than cost […]

Di fatto, non possiamo utilizzare il costo di uno statement per dire se è buono o meno. Il motivo di tale affermazione è il risultato di un thread sul sito di Kyte che devo dire essere estremamente interessante.

Il mio personale consiglio è ovviamente di leggerlo. Qui riporto la risposta che Wolfgang Breitling in quel thread da ad una domanda fatta da un lettore:

To those who cling to the illusion that the "cost" of an explained sql is in any way related to the performance, read Tom's lips: "It is not".
The cost is used during the parse to pick one plan among all the ones considered (the one with the lowest "cost"). Once it has served that purpose and a plan is chosen, it is meaningless. In particular, it can NOT be compared to the cost of a another explain. If everything is the same between the two explain, the resulting plans and their costs will be the same. If they are not, then something is different and you can not compare the results anymore.
In pre-Oracle9, the "cost" is roughly equal to the estimated number of logical reads (db blocks needed to find the answer). That estimate of the number of blocks to be visited can be different from reality for a number of reasons, only a few of which could be considered bugs. A lot of the reasons for wrong estimates are due to data distribution and dependencies. That is what Tom refers to a human knowledge about the data which goes far beyond what the optimizer can discern from the statistics. For a presentation I am giving at the Hotsos Performance Symposium (IOUG wasn't interested) I have prepared testcases where the same query on tables with virtually identical statistics returns vastly different numbers of rows to show that it is impossible for the optimizer to come up with accurate estimates and therefore the best plan in all cases.

Cost and execution time are NOT related. There are plenty of examples that prove that they are not. The comparison of costs can only be made by the optimizer while it is parsing a sql and is evaluating different access paths. Once an access path is chosen it is invalid to compare its cost number to that of a different plan and draw any conclusion from it.
When parsing a sql the optimizer, in addition to using the gathered statistics - table as well as system in 9i - the optimizer has to make many assumptions about the data (uniform distribution for example in the absence of histogram data) and if the reality differs from the assumption the plan that came out with the lowest cost based on the assumptions can very well be far inferior to a plan that has a higher cost number (based on the assumptions) but is better suited to the reality and hence performs better. It is a rather far stretch to call the inability of the optimizer to accurately estimate certain characteristics of the data in the database, given limited information, a bug.

Nessun commento: