E learning: March 2007

Course Contents

Database Management systems

1. Introduction:

1.1 Approach to Data Management
1.2 Advantages of Database Systems
1.3 Functions of DBMS
1.4 Role of a Database Administrator
1.5 Types of Database Systems

2. The Internal Level
2.1 Clustering
2.2 Indexing
2.3 Hashing

3. The Relational Model
3.1 Relational Databases: Terminology
3.2 Properties of Relations
3.3 Integrity Rules
3.4 Relational Algebra Operators

4. SQL
4.1 Overview
4.2 Data Manipulation Language
4.3 Data Definition Language
4.4 Data Control Language

5. Recovery and Concurrency
5.1 Transaction
5.2 Recovery from System Failures
5.3 Recovery: An Example
5.4 Concurrency
5.5 Locking
5.6 Deadlocks

6. Query Optimization
6.1 Overview
6.2 An Example of Query Optimization
6.3 The Query Optimization Process
6.4 Query Optimization in Oracle

6.4 Query Optimization in Oracle

Query Optimization in Oracle

Some of the query optimization measures used in Oracle are the following:

–Indexes unnecessary for small tables. i.e., if the size of the actual data record is not much larger than the index record, the search time in the index table and the data table will be comparable. Hence indexes will not make much difference in the performance of queries.

–Indexes/clusters when retrieving less than 25% of rows. The overhead of searching in the index file will be more when retrieving more rows.

–Multiple column WHERE clauses

–evaluations causing largest number of eliminations performed first

–JOIN-columns should be indexed. JOIN columns or Foreign Key columns may be indexed since queries based on these columns can be expected to be very frequent.

–Index not used in queries containing NULL / NOT NULL. Index tables will not have NULL / NOT NULL entries. Hence need not search for these in the index table.

6.3 The Query Optimization Process

The steps of query optimization are explained below.

a) Cast into some Internal Representation – This step involves representing each SQL query into some internal representation which is more suitable for machine manipulation. The internal form typically chosen is a query tree as shown below.

Query Tree for the SELECT statement discussed above:

b)Convert to Canonical Form – In this second step, the optimizer makes use of some transformation laws or rules for sequencing the internal operations involved. Some examples are given below.
(Note: In all these examples the second form will be more efficient irrespective of the actual data values and physical access paths that exist in the stored database. )

Rule 1:

(A JOIN B) WHERE restriction_A AND restriction_B

(A WHERE restriction_A) JOIN (B WHERE restriction_B)
Restrictions when applied first, cause eliminations and hence better performance.

Rule 2:

(A WHERE restriction_1) WHERE restriction_2

A WHERE restriction_1 AND restriction_2
Two restrictions applied as a single compound one instead applying the two individual restrictions separately.

Rule 3:

(A[projection_1])[projection_2]

A[projection_2]

If there is a sequence of successive projections applied on the same relation, all but the last one can be ignored. i.e., The entire operation is equivalent to applying the last projection alone.

Rule 4:

(A[projection]) WHERE restriction

(A WHERE restriction)[projection]
Restrictions when applied first, cause eliminations and hence better performance.

Reference [1] gives more such general transformation laws.
c)Choose Candidate Low-level Procedures – In this step, the optimizer decides how to execute the transformed query. At this stage factors such as existence of indexes or other access paths, physical clustering of records, distribution of data values etc. are considered.

The basic strategy here is to consider the query expression as a set of low-level implementation procedures predefined for each operation. For eg., there will be a set of procedures for implementing the restriction operation: one (say, procedure 'a') for the case where the restriction attribute is indexed, one (say, procedure 'b') where the restriction attribute is hashed and so on.

Each such procedure has and associated cost measure indicating the cost, typically in terms of disk I/Os.

The optimizer chooses one or more candidate procedures for each low-level operations in the query. The information about the current state of the database (existence of indexes, current cardinalities etc.) which is available from the system catalog will be used to make this choice of candidate procedures.

d)Generate Query Plans and Choose the Cheapest – In this last step, query plans are generated by combining a set of candidate implementation procedures. This can be explained with the following example(A trivial one but illustrative enough).

Assume that there is a query expression comprising a restriction, a join and a projection. Some examples, of implementation procedures available for each of these operations can be assumed as given in the table below.

Operation	Condition Existing	Implementation Procedure
Restriction	Restriction attribute is indexed	a
Restriction	Restriction attribute is hashed	b
Restriction	Restriction attribute is neither indexed nor hashed	c
Join		d
Join		e
Projection		f
Projection		g

Now the various query plans for the original query expression can be generated by making permutations of implementation procedures available for different operations. Thus the query plans can be

– adf
- adg
– aef
– aeg
– bdf
...
...

It has to be noted that in reality, the number of such query plans possible can be too many and hence generating all such plans and then choosing the cheapest will be expensive by itself. Hence a heuristic reduction of search space rather than exhaustive search needs to be done. Considering the above example, one such heuristic method can be as follows:

If the system knows that the restriction attribute is neither indexed nor hashed, then the query plans involving implementation procedure 'c ' alone (and not 'a' and 'b') need to be considered and the cheapest plan can be chosen from the reduced set of query plans.

6.2 An Example of Query Optimization

Let us look at a query being evaluated in two different ways to see the dramatic effect of query optimization.

Consider the following query.

Select ORDDATE, ITEM#, QTY
from ORDTBL, ORD_ITEMS
where ORDTBL.ORD# = ORD_ITEMS.ORD#
and ITEM# = 'HW3';

Assumptions:

There are 100 records in ORDTBL
There are 10,000 records in ORD_ITEMS
There are 50 order items with item# 'HW3'

Query Evaluation – Method 1

T1 = ORDTBL X ORD_ITEMS
(Perform the Product operation as the first step towards joining the two tables)

- 10000 X 100 tuple reads (1000000 tuple reads -> generates 1000000 tuples as intermediate result)
- 1000000 tuples written to disk (Assuming that 1000000 tuples in the intermediate result cannot be held in the memory. 1000000 tuple writes to a temporary space in the disk.)

T2 = ORDTBL.ORD# = ORD_ITEMS.ORD# & ITEM# 'HW3'(T1)
(Apply the two conditions in the query on the intermediate result obtained after the first step)

- 1000000 tuples read into memory (1000000 tuple reads)
- 50 selected (those tuples satisfying both the conditions. 50 held in the memory itself)

T3 = ORDDATE,ITEM#,QTY (T2)
(Projection performed as the final step. No more tuple i/o s)

- 50 tuples (final result)

Total no. of tuple i/o s = 1000000 reads + 1000000 writes + 1000000 reads
= 3000000 tuple i/o s

Query Evaluation – Method 2

T1 = _ITEM#='HW3'(ORD_ITEMS) (Perform the Select operation on ORD_ITEMS as the first step)

- 10000 tuple reads (10000 tuple reads from ORD_ITEMS)
- 50 tuples selected; no disk writes (50 tuples satisfy the condition in Select. No disk writes assuming that the 50 tuples forming the intermediate result can be held in the memory)

T2 = ORDTBL JOIN T1

- 100 tuple reads (100 tuple reads from ORDTBL)
- resulting relation with 50 tuples

T3 = _{ORDDATE, ITEM#, QTY}(T2)
(Projection performed as the final step. No more tuple i/o s)

- 50 tuples (final result)

Total no. of tuple i/o s = 10000 reads + 100 reads
= 10100 tuple i/o's

Comparison of the two Query Evaluation Methods

10,100 tuple I/O's (of Method 2) v/s 3,000,000 tuple I/O's (of Method 1) !

Thus by sequencing the operations differently a dramatic difference can be made in the performance of queries.

Here it needs to be noted that in the Method 2 of evaluation, the first operation to be performed was a 'Select' which filters out 50 tuples from the 10,000 tuples in the ORD_ITEMS table. Thus this operation causes elimination of 9950 tuples. Thus elimination in the initial steps would help optimization.

Some more examples:

select CITY, COUNT(*) from CUSTTBL
where CITY != 'BOMBAY'
group by CITY;

v/s

select CITY, COUNT(*) from CUSTTBL
group by CITY
having CITY != 'BOMBAY';

select * from ORDTBL
where to_char(ORDDATE,'dd-mm-yy')
= '11-08-94';

v/s

select * from ORDTBL
where ORDDATE = to_date('11-08-94',
'dd-mm-yy');

Here the second version is faster. In the first form of the query, a function to_char is applied on an attribute and hence needs to be evaluated for each tuple in the table. The time for this evaluation will be thus proportional to the cardinality of the relation. In the second form, a function to_date is applied on a constant and hence needs to be evaluated just once, irrespective of the cardinality of the relation. Moreover, if the attribute ORDDATE is indexed, the index will not be used in the first case, since the attribute appears in an expression and its value is not directly used.

6.1 Query Optimization Overview

Overview :

When compared to other database systems, query optimization is a strength of the relational systems. It can be said so since relational systems by themselves do optimization to a large extent unlike the other systems which leave optimization to the programmer. Automatic optimization done by the relational systems will be much more efficient than manual optimization due to several reasons like :

uniformity in optimization across programs irrespective of the programmer's expertise in optimizing the programs.
system's ability to make use of the knowledge of internal conditions (eg: volume of data at the time of querying) for optimization. For the same query, such conditions may be different at different times of querying. (In a manual system, this knowledge can be utilised only if the query is re-written each time, which is not practically possible.)
system's ability to evaluate large number of alternatives to find the most efficient query evaluation method.

In this chapter we shall look into the process of automatic query optimization done by the relational systems.

5.6 Deadlocks

Deadlocks

Locking can be used to solve the problems of concurrency. However, locking can also introduce the problem of deadlock as shown in the example below.

Deadlock is a situation in which two or more transactions are in a simultaneous wait state, each of them waiting for one of the others to release a lock before it can proceed.

If a deadlock occurs, the system may detect it and break it. Detecting involves detecting a cycle in the “Wait-For Graph” (a graph which shows 'who is waiting for whom'). Breaking a deadlock implies choosing one of the deadlocked transactions as the victim and rolling it back, thereby releasing all its locks. This may allow some other transaction(s) to proceed.

Deadlock prevention can be done by not allowing any cyclic-waits.

5.5 Locking

Locking

Locking: A solution to problems arising due to concurrency.

Locking of records can be used as a concurrency control technique to prevent the above mentioned problems. A transaction acquires a lock on a record if it does not want the record values to be changed by some other transaction during a period of time. The transaction releases the lock after this time.

Locks are of two types

shared (S lock)
and exclusive (X Lock).

A transaction acquires a shared (read) lock on a record when it wishes to retrieve or fetch the record.
An exclusive (write) lock is acquired on a record when a transaction wishes to update the record. (Here update means INSERT, UPDATE or DELETE.)

The following figure shows the Lock Compatibility matrix.

Normally, locks are implicit. A FETCH request is an implicit request for a shared lock whereas an UPDATE request is an implicit request for an exclusive lock.
Explicit lock requests need to be issued if a different kind of lock is required during an operation. For example, if an X lock is to acquired before a FETCH it has to be explicitly requested for.

5.4 Concurrency

Concurrency

Concurrency refers to multiple transactions accessing the same database at the same time. In a system which allows concurrency, some kind of control mechanism has to be in place to ensure that concurrent transactions do not interfere with each other.

Three typical problems which can occur due to concurrency are explained here.

a) Lost Update Problem

(To understand the above situation, assume that

there is a record R, with a field, say Amt, having value 1000 before time t1.
- Both transactions A & B fetch this value at t1 and t2 respectively.
- Transaction A updates the Amt field in R to 800 at time t3.
- Transaction B updates the Amt field in R to 1200 at time t4.

Thus after time t4, the Amt value in record R has value 1200. Update by Transaction A at time t3 is over-written by the Transaction B at time t4.)

b) Uncommitted Dependency Problem

(To understand the above situation, assume that

there is a record R, with a field, say Amt, having value 1000 before time t1.
- Transaction B fetches this value and updates it to 800 at time t1.
- Transaction A fetches R with Amt field value 800 at time t2.
- Transaction B rolls back and its update is undone at time t3. The Amt field takes the initial value 1000 during rollback.

Transaction A continues processing with Amt field value 800 without knowing about B's rollback.)

c) Inconsistent Analysis Problem

5.3 Recovery: An Example

Recovery : An Example

At the time of restart, T3 and T5 must be undone and T2 and T4 must be redone.

T1 does not enter the recovery procedure at all since it updates were all written to the database at time tc as part of the checkpoint proces

5.2 Recovery from System Failures

Recovery from System Failures

System failures (also called soft crashes) are those failures like power outage which affect all transactions in progress, but do not physically damage the database.

During a system failure, the contents of the main memory are lost. Thus the contents of the database buffers which contain the updates of transactions are lost. (Note: Transactions do not directly write on to the database. The updates are written to database buffers and, at regular intervals, transferred to the database.) At restart, the system has to ensure that the ACID properties of transactions are maintained and the database remains in a consistent state. To attain this, the strategy to be followed for recovery at restart is as follows:

Transactions which were in progress at the time of failure have to be undone at the time of restart. This is needed because the precise state of such a transaction which was active at the time of failure is no longer known and hence cannot be successfully completed.
Transactions which had completed prior to the crash but could not get all their updates transferred from the database buffers to the physical database have to redone at the time of restart.

This recovery procedure is carried out with the help of

• An online logfile or journal – The logfile maintains the before- and after-images of the tuples updated during a transaction. This helps in carrying out the UNDO and REDO operations as required. Typical entries made in the logfile are :

Start of Transaction Marker
Transaction Identifier
Record Identifier
Operations Performed
Previous Values of Modified Data (Before-image or Undo Log)
Updated Values of Modified Records (After-image or Redo Log)
Commit / Rollback Transaction Marker

• Taking a checkpoint at specific intervals – This involves the following two operations:
a) physically writing the contents of the database buffers out to the physical database. Thus during a checkpoint the updates of all transactions, including both active and committed transactions, will be written to the physical database.
b)physically writing a special checkpoint record to the physical log. The checkpoint record has a list of all active transactions at the time of taking the checkpoint.

5.1 Transaction

Recovery and Concurrency

Recovery and Concurrency in a DBMS are part of the general topic of transaction management. Hence we shall begin the discussion by examining the fundamental notion of a transaction.

1 Transaction

A transaction is a logical unit of work.

Consider the following example:

The procedure for transferring an amount of Rs. 100/- from the account of one customer to another is given.

 
 EXEC SQL  WHENEVER SQLERROR GOTO
          UNDO
EXEC SQL  UPDATE DEPOSIT
      SET BALANCE=BALANCE-100
   WHERE CUSTID=from_cust;
EXEC SQL  UPDATE DEPOSIT
      SET BALANCE=BALANCE+100
   WHERE CUSTID=to_cust:
EXEC SQL  COMMIT;
      GOTO FINISH
UNDO:
EXEC SQL  ROLLBACK;
FINISH:
RETURN;

Here, it has to be noted that the single operation “amount transfer” involves two database updates – updating the record of from_cust and updating the record of to_cust. In between these two updates the database is in an inconsistent (or incorrect in this example) state. i.e., if only one of the updates is performed, one cannot say by seeing the database contents whether the amount transfer operation has been done or not. Hence to guarantee database consistency it has to be ensured that either both updates are performed or none are performed. If, after one update and before the next update, something goes wrong due to problems like a system crash, an overflow error, or a violation of an integrity constraint etc., then the first update needs to be undone.

This is true with all transactions. Any transaction takes the database from one consistent state to another. It need not necessarily preserve consistency of database at all intermediate points. Hence it is important to ensure that either a transaction executes in its entirety or is totally cancelled. The set of programs which handles this forms the transaction manager in the DBMS. The transaction manager uses COMMIT and ROLLBACK operations for ensuring atomicity of transactions.

COMMIT – The COMMIT operation indicates successful completion of a transaction which means that the database is in a consistent state and all updates made by the transaction can now be made permanent. If a transaction successfully commits, then the system will guarantee that its updates will be permanently installed in the database even if the system crashes immediately after the COMMIT.

ROLLBACK – The ROLLBACK operation indicates that the transaction has been unsuccessful which means that all updates done by the transaction till then need to be undone to bring the database back to a consistent state. To help undoing the updates once done, a system log or journal is maintained by the transaction manager. The before- and after-images of the updated tuples are recorded in the log.

The properties of transaction can be summarised as ACID properties - ACID standing for atomicity, consistency, isolation and durability.

Atomicity: A transaction is atomic. Either all operations in the transaction have to be performed or none should be performed.

Consistency: Transactions preserve database consistency. i.e., A transaction transforms a consistent state of the database into another without necessarily preserving consistency at all intermediate points.

Isolation: Transactions are isolated from one another. i.e., A transaction's updates are concealed from all others until it commits (or rolls back).

Durability: Once a transaction commits, its updates survive in the database even if there is a subsequent system crash.

4.4 Data Control Language

DCL – GRANT and REVOKE statements.

DCL statements are those which are used to control access permissions on the tables, indexes, views and other elements of the DBMS.

Granting & Revoking Privileges

Query:

GRANT ALL <------------------

ON customers

TO ashraf;

Grants all permissions on the table customers to the user who logs in as 'ashraf'.

Query:

GRANT SELECT <--------------

ON customers

TO sunil;

Grants SELECT permission on the table customers to the user 'sunil'. User 'sunil' does not have permission to insert, update, delete or perform any other operation on customers table.

Query:

GRANT SELECT

ON customers

TO sunil

WITH GRANT OPTION; <---------

Enables user 'sunil' to give SELECT permission on customers table to other users.

Query:

REVOKE DELETE <-------------

ON customers

FROM ashraf;

Takes away DELETE permission on customers table from user 'ashraf'.

4.3 Data Definition Language

4.3 DDL – CREATE, ALTER, and DROP statements.

DDL statements are those which are used to create, modify and drop the definitions or structures of various tables, views, indexes and other elements of the DBMS.

The CREATE TABLE statement

Creates a new table.

General form:

CREATE TABLE

(*);

* - table element may be attribute with its data-type and size or any integrity constraint on attributes.

Some CREATE TABLE statements on the Case Example

Query:

CREATE TABLE customers

( cust# NUMBER(6) NOT NULL,

custname CHAR(30) ,

city CHAR(20));

- This query Creates a table CUSTOMERS with 3 fields - cust#, custname and city. Cust# cannot be null

Query:

CREATE TABLE ord_sep <-------------------

AS SELECT * from ord_aug;

Creates a new table ord_sep, which has the same structure of ord_aug. The data in ord_aug is copied to the new table ord_sep.

- This query Creates table ORD_SEP as a cpy of ORD-AUG. Copies structure as well as data.

Query:

CREATE TABLE ord_sep <------------------

AS SELECT * from ord_aug

WHERE 1 = 2;

Creates a new table ord_sep, which has the same structure of ord_aug. No data in ord_aug is copied to the new table since there is no row which satisfies the 'always false' condition 1 = 2.

- This query Creates table ORD_SEP as a copy of ORD_AUG, but does not copy any data as the WHERE clause is never satisfied.

The ALTER TABLE statement

Alters the structure of an existing table.

General form:

ALTER TABLE

ADD | MODIFY (

Examples of ALTER TABLE statement.

Query:

ALTER TABLE customers

MODIFY custname CHAR(35); <-------------

Modifies the data type/size of an attribute in the table

- This query changes the custname field to a character field of length 35. Used for modifying field lengths and attributes.

Query:

ALTER TABLE customers

ADD (phone number(8), <------------------

credit_rating char(1));

Adds two new attributes to the Customers table. Here, for existing tuples (if any), the new attribute will take NULL values since no DEFAULT value is mentioned for the attribute.

- This query adds two new fields - phone & credit_rating to the customers table.

The DROP TABLE statement

DROPS an existing table.

General form:

DROP TABLE ;

Example:

Query:

DROP TABLE ord_sep;

- The above query drops table ORD_SEP from the database

Creating & Dropping Views

A view is a virtual relation created with attributes from one or more base tables.

SELECT * FROM myview1; at any given time will evaluate the view-defining query in the CREATE VIEW statement and display the result.

Query:

CREATE VIEW myview1

AS SELECT

ord#, orddate, ord_aug.cust#, custname

FROM ord_aug, customers

WHERE ord_aug.cust# = customers.cust#;

- This query defines a view consisting of ord#, cust#, and custname using a join of ORD_AUG and CUSTOMERS tables.

Query:

CREATE VIEW myview2 (ItemNo, Quantity)

AS SELECT item#, qty

FROM ord_items;

- This query defines a view with columns item# and qty from the ORD_ITEMS table, and renames these columns as ItemNo. and Quantity respectively.

Query:

CREATE VIEW myview3

AS SELECT item#, descr, price

FROM items

WHERE price <>

WITH CHECK OPTION; <-------------------

WITH CHECK OPTION in a CREATE VIEW statement indicates that INSERTs or UPDATEs on the view will be rejected if they violate any integrity constraint implied by the view-defining query.

- This query defines the view as defined. WITH CHECK OPTION ensures that if this view is used for updation, the updated values do not cause the row to fall outside the view.

Query:

DROP VIEW myview1; <---- To drop a view

- this query drops the view MYVIEW1

Creating & Dropping Indexes

Query:

CREATE INDEX i_city <--------------------

ON customers (city);

Creates a new index named i_city. The new index file(table) will have the values of city column of Customers table

Query:

CREATE UNIQUE INDEX i_custname <------

ON customers (custname);

Creates an index which allows only unique values for custnames

Query:

CREATE INDEX i_city_custname <---------

ON customers (city, custname);

Creates an index based on two fields : city and custname

Query:

DROP INDEX i_city; <--------------------

Drops index i_city

4.2.4 The DELETE statement

Deletes one or more tuples in a table according to given conditions

General form:

DELETE FROM

[ WHERE ];

Sample DELETE statements from the Case Example

Query 40: Deletes Customer record with Customer Number 004

DELETE FROM customers

WHERE cust# = 004;

DELETE FROM Ord_Items; <-------------------

Deletes all rows in Ord_Items Table. The table remains empty after the DELETE operation.

4.2.2 The INSERT statement

The INSERT statement

Inserts one or more tuples in a table.

General forms:

To insert a single tuple

INSERT INTO []

VALUES ;

To insert multiple tuples

INSERT INTO []

SELECT [ ALL | DISTINCT ]

FROM *
[ WHERE ];

* - list of existing tables
Sample INSERT statements from the Case Example
Query 34: Insert all values for a new row

INSERT INTO customers <-------------------

VALUES (006, 'Krishnan', 'Madras');

Inserts a single row in Customers Table. Attribute list need not be mentioned if values are given for all attributes in the tuple.

Query 35: Insert values of item# & descr columns for a new row

INSERT INTO items (item#, descr) <----------

VALUES ('HW4', '132-DMPrinter');

Attribute list mentioned since values are not given for all attributes in the tuple. Here Price column for the newly inserted tuple takes NULL value.

Query 36: Inserts a new row which includes a date field

INSERT INTO ord_aug

VALUES(106, '31-AUG-94', 005);

Query 37: Inserts a new row with the date field being specified in non DD-MON-YY format

INSERT INTO ord_aug

VALUES (106, TO_DATE('310894','DDMMYY'), 005);

4.2.1.6 Character Expressions & Functions

|| - Concatenate operator

Query 29:

SELECT custname || ' - ' || city

FROM customers;

Query 29 - Result:

CustName||'-'|| City

Shah - Bombay

Srinivasan - Madras

Gupta - Delhi

Banerjee - Calcutta

Apte - Bombay

Examples of Character Functions:

INITCAP(string)

UPPER(string)

LOWER(string)

SUBSTR(string,start,no. of characters)

Group Functions

Group functions are functions which act on the entire column of selected rows.

Query 30:

SELECT SUM(qty), AVG(qty) <---------------

FROM ord_items

WHERE item#='SW1';

SUM and AVG are examples of Group Functions. They compute the sum/average of qty values of all rows where item#='SW1'.

Query 30 - Result:

SUM(QTY)	AVG(QTY)
250	125

Examples of Group Functions:

SUM

AVG

COUNT

MAX

MIN

Query 31:

SELECT item#, SUM(qty)

FROM ord_items

GROUP BY item#; <-------------------------

GROUP BY clause used to group rows according to the value of item# in the result. SUM function acts individually on each group of rows.

Query 31 - Result:

Item #	SUM(Qty)
HW1	100
HW2	35
HW3	200
SW1	250

Query 32:

SELECT item#, SUM(qty)

FROM ord_items

GROUP BY item#

HAVING SUM(qty)>100; <------------------

HAVING clause used to apply the condition to be applied on the grouped rows and display the final result.

Query 32 - Result:

Item#	SUM(QTY)
HW3	200
SW1	250

Query 33:

SELECT item#, SUM(qty)

FROM ord_items

GROUP BY item#

HAVING COUNT(*)>2;

Query 33 - Result:

Item#	SUM(Qty)
HW3	200

4.2.3 The UPDATE statement

The UPDATE statement

Updates values of one or more attributes of one or more tuples in a table.

General form:

UPDATE

SET

[ WHERE ];

Sample UPDATE statements from the Case Example

Query 38: changes price of itmem SW1 to 6000

UPDATE items

SET price = 6000

WHERE item# ='SW1';

Query 39: Changes a wrongly entered item# from HW2 to SW2

UPDATE ord_items

SET item# = 'SW2'

WHERE ord#=104 AND item# = 'HW2';

4.2.1.5 Date Functions

Date Functions

MONTHS_BETWEEN(date1, date2)

ADD_MONTHS(date, no. of months)

SYSDATE

Returns system date.

Query 27:

SELECT ord#,

MONTHS_BETWEEN(SYSDATE,orddate)

FROM ord_aug;

Query 27 - Result:

Ord#	MONTHS_BETWEEN(SYSDATE,OrdDate)
101	4.92190562
102	4.63158303
103	4.30900239
104	4.08319594
105	4

Query 28:

SELECT TO_CHAR(orddate,' DD/MM/YYYY') <---

FROM ord_aug;

Converts the value of the date field orddate to character string of the format DD/MM/YYYY

Query 28 - Result:

TO_CHAR(OrdDate,'DD/MM/YYYY')

02/08/1994

11/08/1994

21/08/1994

28/08/1994

30/08/1994

Note:

DD - day of month (1-31)

D - day of week (1-7)

DAY - name of day

MM - month (01-12)

MONTH - name of month

MON - abbreviated name of month

HH:MI:SS - hours:minutes:seconds

fm - fill mode : suppress blank padding

4.2.1.4 Numeric Functions

Query 24:

SELECT qty, ROUND(qty/2,0) "qty supplied"

FROM ord_items

WHERE item#='HW2';

Query 24 - Result:

Qty	ROUND(Qty/2,0)
10	5
25	13

Query 25:

SELECT qty, TRUNC(qty/2,0) "qty supplied"

FROM ord_items

WHERE item#='HW2';

Query 25 - Result:

Qty	TRUNC(Qty/2,0)
10	5
25	12

Examples of Numeric Functions

MOD(n,m)

SQRT(n)

ROUND(n,m)

TRUNC(n,m)

'm' indicates the number of digits after decimal points in the result.

4.2.1.3 Date Arithemetic Arithmetic Expressions

Arithmetic Expressions

+

-

*

/

( )

Arithmetic functions are allowed in SELECT and WHERE clauses.

Query 22:

SELECT descr, price, price*0.1 "discount"

FROM items

WHERE price >= 4000

ORDER BY 3;

Query 22 - Result:

Descr	Price	Discount
Power Supply	4000	400
MS-DOS 6.0	5000	500
MS-Word 6.0	8000	800

Query 23:

SELECT descr

FROM items, ord_items

WHERE price*qty > 250000

and items.item# = ord_items.item#;

Query 23 - Result:

Descr

Power Supply

MS-DOS 6.0

MS-Word 6.0

Date Arithemetic

Date + No. of days

Date - No. of days

Date – Date

Query 26:

SELECT ord#, orddate+15 "Supply by"

FROM ord_aug;

Query 26 - Result:

Ord#	Supply by
101	17-AUG-94
102	26-AUG-94
103	05-SEP-94
104	12-SEP-94
105	14-SEP-94

4.2.1.2 Nested SELECT statements

Nested SELECT statements

SQL allows nesting of SELECT statements. In a nested SELECT statement the inner SELECT is evaluated first and is replaced by its result to evaluate the outer SELECT statement.

Query 20:

SELECT item#, descr, price <---------------------------

FROM items

WHERE price > (SELECT AVG(price) FROM items); <------

Outer SELECT statement

Inner SELECT statement

Query 20 - Result:

Item#	Descr	Price
HW1	Power Supply	4000
SW1	MS-DOS 6.0	5000
SW2	MS-Word 6.0	8000

Query 21:

SELECT cust#, custname <------------------

FROM customers

WHERE city = ( SELECT city FROM customers

WHERE custname='Shah');

Result :

SELECT cust#, custname <------------------

FROM customers

WHERE city = ( SELECT city FROM customers

WHERE custname='Shah');

Here the outer SELECT is evaluated as
SELECT cust#, custname
FROM customers
WHERE city = "BOMBAY"

4.2.1.1 General SELECT statements

The SELECT statement

Retrieves rows from one or more tables according to given conditions.

General form:

SELECT [ ALL | DISTINCT ]

FROM
[ WHERE ]
[ ORDER BY [DESC]
[ GROUP BY ]
[ HAVING ]

Query 1:
Some SELECT statements on the Case Example

SELECT * <-----------------

FROM items;

* -denotes all attributes in the table

Result

Query 2:

SELECT cust#,custname

FROM customers;

Result

Query 3:

SELECT DISTINCT item#

FROM ord_items;

Result

Query 4:

SELECT ord# "Order ", orddate "Ordered On" <----

FROM ord_aug;

In the result set the column headings will appear as “Order” and “Ordered On” instead of ord# and orddate.

Result

Query 5:

SELECT item#, descr

FROM items

WHERE price>2000;

Result

Query 6:

SELECT custname

FROM customers

WHERE city<>'Bombay';

Result

Query 7:

SELECT custname

FROM customers

WHERE UPPER(city)<>'BOMBAY';

Result

Query 8:

SELECT *

FROM ord_aug

WHERE orddate > '15-AUG-94'; <-----------

Illustrates the use of 'date' fields. In SQL, a separate datatype (eg: date, datetime etc.) is available to store data which is of type date.

Result

Query 9:

SELECT *

FROM ord_items

WHERE qty BETWEEN 100 AND 200;

Result

Query 10:

SELECT custname

FROM customers

WHERE city IN ('Bombay', 'Madras'); <-------

The conditional expression evaluates to TRUE for those records for which the value of city field is in the list ('Bombay, 'Madras')

Result

Query 11:

SELECT custname

FROM customers

WHERE custname LIKE 'S%' ; <------------

LIKE 'S%' - 'S' followed by zero or more characters

Result

Query 12:

SELECT *

FROM ord_items

WHERE qty>100 AND item# LIKE 'SW%';

Result

Query 13:

SELECT custname

FROM customers

WHERE city='Bombay' OR city='Madras';

Result

Query 14:

SELECT *

FROM customers

WHERE city='Bombay'

ORDER BY custname; <--------------------

Records in the result set is displayed in the ascending order of custname

Result

Query 15:

SELECT *

FROM ord_items

ORDER BY item#, qty DESC; <-------------

Display the result set in the ascending order of item#. If there are more than one records with the same item# , they will be displayed in the descending order of qty

Result

Query 16:

SELECT descr, price

ORDER BY 2
FROM items

ORDER BY 2; <----------------------------

ORDER BY the 2nd attribute (price) in the attribute list of the SELECT clause

Result

Query 17:

SELECT ord#, ord_aug.cust#, custname <----------------

FROM ord_aug, customers

WHERE city='Delhi'

AND ord_aug.cust# = customers.cust#; <----------------

SELECT statement implementing JOIN operation.

JOIN condition

Result

Query 18:

SELECT ord#, customers.cust#, city

FROM ord_aug, customers

WHERE ord_aug.cust# = customers.cust#;

Result

Query 19:

SELECT ord#, customers.cust#, city

FROM ord_aug, customers

WHERE ord_aug.cust# = customers.cust# (+); <-----------

(+) indicates outer join. Here it is a right outer join as indicated by the (+) after the right side field.

Result

4.2.1Data Manipulation Language-SELECT

DML – SELECT, INSERT, UPDATE and DELETE statements.

The SELECT statement :

1. General form

2. Nested SELECT statements

3. Date Arithemetic Arithmetic Expressions

4. Numeric Functions

5. Date Functions

Character Expressions & Functions
- Concatenate operator & Group Functions

The INSERT statement

The UPDATE statement

The DELETE statement

4.1 SQL Overview

The components of SQL are

a. Data Manipulation Language – Consists of SQL statements for operating on the data (Inserting, Modifying, Deleting and Retrieving Data) in tables which already exist.

b. Data Definition Language – Consists of SQL statements for defining the schema (Creating, Modifying and Dropping tables, indexes, views etc.)

c. Data Control Language – Consists of SQL statements for providing and revoking access permissions to users

Tables used:

Ord_Aug

Ord#	OrdDate	Cust#
101	02-AUG-94	002
102	11-AUG-94	003
103	21-AUG-94	003
104	28-AUG-94	002
105	30-AUG-94	005

Items

Item#	Descr	Price
HW1	Power Supply	4000
HW2	101- Keyboard	2000
HW3	Mouse	800
SW1	MS-DOS 6.0	5000
SW2	MS-Word 6.0	8000

Customers

Cust#	CustName	City
001	Shah	Bombay
002	Srinivasan	Madras
003	Gupta	Delhi
004	Banerjee	Calcutta
005	Apte	Bombay

3.4 Relational Algebra Operators

The eight relational algebra operators are

1. SELECT – To retrieve specific tuples/rows from a relation.

Ord#	OrdDate	Cust#
101	02-08-94	002
104	18-09-94	002

2. PROJECT – To retrieve specific attributes/columns from a relation.

Descr	Price
Power Supply	4000
101-Keyboard	2000
Mouse	800
MS-DOS 6.0	5000
MS-Word 6.0	8000

3. PRODUCT – To obtain all possible combination of tuples from two relations.

Ord#	OrdDate	O.Cust#	C.Cust#	CustName	City
101	02-08-94	002	001	Shah	Bombay
101	02-08-94	002	002	Srinivasan	Madras
101	02-08-94	002	003	Gupta	Delhi
101	02-08-94	002	004	Banerjee	Calcutta
101	02-08-94	002	005	Apte	Bombay
102	11-08-94	003	001	Shah	Bombay
102	11-08-94	003	002	Srinivasan	Madras

4. UNION – To retrieve tuples appearing in either or both the relations participating in the UNION.

Eg: Consider the relation Ord_Jul as follows
(Table: Ord_Jul)

Ord#	OrdDate	Cust#
101	03-07-94	001
102	27-07-94	003
101	02-08-94	002
102	11-08-94	003
103	21-08-94	003
104	28-08-94	002
105	30-08-94	005

Note: The union operation shown above logically implies retrieval of records of Orders placed in July or in August

5. INTERSECT- To retrieve tuples appearing in both the relations participating in the INTERSECT.

Eg:
To retrieve Cust# of Customers who've placed orders in July and in August

Cust#

003

6. DIFFERENCE – To retrieve tuples appearing in the first relation participating in the DIFFERENCE but not the second.

Eg:
To retrieve Cust# of Customers who've placed orders in July but not in August

Cust#

001

7. JOIN – To retrieve combinations of tuples in two relations based on a common field in both the relations.

Eg:

ORD_AUG join CUSTOMERS (here, the common column is Cust#)

Ord#	OrdDate	Cust#	CustNames	City
101	02-08-94	002	Srinivasan	Madras
102	11-08-94	003	Gupta	Delhi
103	21-08-94	003	Gupta	Delhi
104	28-08-94	002	Srinivasan	Madras
105	30-08-94	005	Apte	Bombay

Note: The above join operation logically implies retrieval of details of all orders and the details of the corresponding customers who placed the orders.

Such a join operation where only those rows having corresponding rows in the both the relations are retrieved is called the natural join or inner join. This is the most common join operation.

Consider the example of EMPLOYEE and ACCOUNT relations.

EMPLOYEE

EMP #	EmpName	EmpCity	Acc#
X101	Shekhar	Bombay	120001
X102	Raj	Pune	120002
X103	Sharma	Nagpur	Null
X104	Vani	Bhopal	120003

ACCOUNT

Acc#	OpenDate	BalAmt
120001	30. Aug. 1998	5000
120002	29. Oct. 1998	1200
120003	1. Jan. 1999	3000
120004	4. Mar. 1999	500

A join can be formed between the two relations based on the common column Acc#. The result of the (inner) join is :

Emp#	EmpName	EmpCity	Acc#	OpenDate	BalAmt
X101	Shekhar	Bombay	120001	30. Aug. 1998	5000
X102	Raj	Pune	120002	29. Oct. 1998	1200
X104	Vani	Bhopal	120003	1. Jan 1999	3000

Note that, from each table, only those records which have corresponding records in the other table appear in the result set. This means that result of the inner join shows the details of those employees who hold an account along with the account details.

The other type of join is the outer join which has three variations – the left outer join, the right outer join and the full outer join. These three joins are explained as follows:

The left outer join retrieves all rows from the left-side (of the join operator) table. If there are corresponding or related rows in the right-side table, the correspondence will be shown. Otherwise, columns of the right-side table will take null values.

EMPLOYEE left outer join ACCOUNT gives:

Emp#	EmpName	EmpCity	Acc#	OpenDate	BalAmt
X101	Shekhar	Bombay	120001	30. Aug. 1998	5000
X102	Raj	Pune	120002	29. Oct. 1998	1200
X103	Sharma	Nagpur	NULL	NULL	NULL
X104	Vani	Bhopal	120003	1. Jan 1999	3000

The right outer join retrieves all rows from the right-side (of the join operator) table. If there are corresponding or related rows in the left-side table, the correspondence will be shown. Otherwise, columns of the left-side table will take null values.

EMPLOYEE right outer join ACCOUNT gives:

Emp#	EmpName	EmpCity	Acc#	OpenDate	BalAmt
X101	Shekhar	Bombay	120001	30. Aug. 1998	5000
X102	Raj	Pune	120002	29. Oct. 1998	1200
X104	Vani	Bhopal	120003	1. Jan 1999	3000
NULL	NULL	NULL	120004	4. Mar. 1999	500

(Assume that Acc# 120004 belongs to someone who is not an employee and hence the details of the Account holder are not available here)

The full outer join retrieves all rows from both the tables. If there is a correspondence or relation between rows from the tables of either side, the correspondence will be shown. Otherwise, related columns will take null values.

EMPLOYEE full outer join ACCOUNT gives:

Emp#	EmpName	EmpCity	Acc#	OpenDate	BalAmt
X101	Shekhar	Bombay	120001	30. Aug. 1998	5000
X102	Raj	Pune	120002	29. Oct. 1998	1200
X103	Sharma	Nagpur	NULL	NULL	NULL
X104	Vani	Bhopal	120003	1. Jan 1999	3000
NULL	NULL	NULL	120004	4. Mar. 1999	500

Q: What will the result of a natural join operation between R1 and R2 ?
A:

a1	b1	c1
a2	b2	c2
a3	b3	c3

8. DIVIDE

Consider the following three relations:

R1 divide by R2 per R3 gives:

Thus the result contains those values from R1 whose corresponding R2 values in R3 include all R2 values.

Blog Archive