6.2 An Example of Query Optimization
Let us look at a query being evaluated in two different ways to see the dramatic effect of query optimization.
Consider the following query.
Select ORDDATE, ITEM#, QTY
from ORDTBL, ORD_ITEMS
where ORDTBL.ORD# = ORD_ITEMS.ORD#
and ITEM# = 'HW3';
Assumptions:
- There are 100 records in ORDTBL
- There are 10,000 records in ORD_ITEMS
- There are 50 order items with item# 'HW3'
Query Evaluation – Method 1
T1 = ORDTBL X ORD_ITEMS
(Perform the Product operation as the first step towards joining the two tables)
- 10000 X 100 tuple reads (1000000 tuple reads -> generates 1000000 tuples as intermediate result)
- 1000000 tuples written to disk (Assuming that 1000000 tuples in the intermediate result cannot be held in the memory. 1000000 tuple writes to a temporary space in the disk.)
T2 = ORDTBL.ORD# = ORD_ITEMS.ORD# & ITEM# 'HW3'(T1)
(Apply the two conditions in the query on the intermediate result obtained after the first step)
- 1000000 tuples read into memory (1000000 tuple reads)
- 50 selected (those tuples satisfying both the conditions. 50 held in the memory itself)
T3 = ORDDATE,ITEM#,QTY (T2)
(Projection performed as the final step. No more tuple i/o s)
- 50 tuples (final result)
Total no. of tuple i/o s = 1000000 reads + 1000000 writes + 1000000 reads
= 3000000 tuple i/o s
Query Evaluation – Method 2
T1 = ITEM#='HW3' (ORD_ITEMS) (Perform the Select operation on ORD_ITEMS as the first step)
- 10000 tuple reads (10000 tuple reads from ORD_ITEMS)
- 50 tuples selected; no disk writes (50 tuples satisfy the condition in Select. No disk writes assuming that the 50 tuples forming the intermediate result can be held in the memory)
T2 = ORDTBL JOIN T1
- 100 tuple reads (100 tuple reads from ORDTBL)
- resulting relation with 50 tuples
T3 = ORDDATE, ITEM#, QTY(T2)
(Projection performed as the final step. No more tuple i/o s)
- 50 tuples (final result)
Total no. of tuple i/o s = 10000 reads + 100 reads
= 10100 tuple i/o's
Comparison of the two Query Evaluation Methods
10,100 tuple I/O's (of Method 2) v/s 3,000,000 tuple I/O's (of Method 1) !
Thus by sequencing the operations differently a dramatic difference can be made in the performance of queries.
Here it needs to be noted that in the Method 2 of evaluation, the first operation to be performed was a 'Select' which filters out 50 tuples from the 10,000 tuples in the ORD_ITEMS table. Thus this operation causes elimination of 9950 tuples. Thus elimination in the initial steps would help optimization.
Some more examples:
1. | select CITY, COUNT(*) from CUSTTBL where CITY != 'BOMBAY' group by CITY; | v/s | select CITY, COUNT(*) from CUSTTBL group by CITY having CITY != 'BOMBAY'; |
2. | select * from ORDTBL | v/s | select * from ORDTBL where ORDDATE = to_date('11-08-94', 'dd-mm-yy'); |
Here the second version is faster. In the first form of the query, a function to_char is applied on an attribute and hence needs to be evaluated for each tuple in the table. The time for this evaluation will be thus proportional to the cardinality of the relation. In the second form, a function to_date is applied on a constant and hence needs to be evaluated just once, irrespective of the cardinality of the relation. Moreover, if the attribute ORDDATE is indexed, the index will not be used in the first case, since the attribute appears in an expression and its value is not directly used.
0 comments:
Post a Comment