14 Chapter 1: Oracle Server Technologies and the Relational Paradigm
The relational paradigm is highly efficient in many respects for many types of
data, but it is not appropriate for all applications. As a general rule, a relational
analysis should be the first approach taken when modeling a system. Only if it proves
inappropriate should one resort to nonrelational structures. Applications where the
relational model has proven highly effective include virtually all Online Transaction
Processing (OLTP) systems and Decision Support Systems (DSS). The relational
paradigm can be demanding in its hardware requirements and in the skill needed
to develop applications around it, but if the data fits, it has proved to be the most
versatile model. There can be, for example, problems caused by the need to maintain
the indexes that maintain the links between tables and the space requirements of
maintaining multiple copies of the indexed data in the indexes themselves and in
the tables in which the columns reside. Nonetheless, relational design is in most
circumstances the optimal model.
A number of software publishers have produced database management systems
that conform (with varying degrees of accuracy) to the relational paradigm; Oracle
is only one. IBM was perhaps the first company to commit major resources to it,
but their product (which later developed into DB2) was not ported to non-IBM
platforms for many years. Microsoft’s SQL Server is another relational database that
has been limited by the platforms on which it runs. Oracle databases, by contrast,
have always been ported to every major platform from the first release. It may be this
that gave Oracle the edge in the RDBMS market place.
A note on terminology: confusion can arise when discussing relational databases
with people used to working with Microsoft products. SQL is a language and SQL
Server is a database, but in the Microsoft world, the term SQL is often used to refer
to either.
Data Normalization
The process of modeling data into relational tables is known as normalization and
can be studied at university level for years. There are commonly said to be three
levels of normalization: the first, second, and third normal forms. There are higher
levels of normalization: fourth and fifth normal forms are well defined, but any
normal data analyst (and certainly any normal human being) will not need to be
concerned with them. It is possible for a SQL application to address un-normalized
data, but this will usually be inefficient as that is not what the language is designed
to do. In most cases, data stored in a relational database and accessed with SQL
should be normalized to the third normal form.
Understand Relational Structures 15
SCENARIO & SOLUTION
Your organization is designing a new Everyone! The project team must involve business analysts
application. Who should be involved? (who model the business processes), systems analysts
(who model the data), system designers (who decide how
to implement the models), developers (you), database
administrators, system administrators, and (most importantly)
end users.
It is possible that relational structures may Attempt to normalize the data into two-dimensional tables,
not be suitable for a particular application. linked with one-to-many relationships. If this really cannot be
How can this be determined, and what done, consider other paradigms. Oracle may well be able to help.
should be done next? Can Oracle help? For instance, maps and other geographical data really don’t work
relationally. Neither does text data (such as word processing
documents). But the Spatial and Text database options can be
used for these purposes. There is also the possibility of using
user-defined objects to store nontabular data.
There are often several possible normalized models for an application. It
is important to use the most appropriate—if the systems analyst gets this
wrong, the implications can be serious for performance, storage needs, and
development effort.
As an example of normalization, consider an un-normalized table called BOOKS
that stores details of books, authors, and publishers, using the ISBN number as the
primary key. A primary key is the one attribute (or attributes) that can uniquely
identify a record. These are two entries:
ISBN Title Authors Publisher
12345 Oracle 11g OCP SQL John Watson, McGraw-Hill, Spear
Fundamentals 1 Exam Guide Roopesh Ramklass Street, San Francisco,
CA 94105
67890 Oracle 11g New Features Sam Alapati McGraw-Hill, Spear
Exam Guide Street, San Francisco,
CA 94105
Storing the data in this table gives rise to several anomalies. First, here is the
insertion anomaly: it is impossible to enter details of authors who are not yet
16 Chapter 1: Oracle Server Technologies and the Relational Paradigm
published, because there will be no ISBN number under which to store them.
Second, a book cannot be deleted without losing the details of the publisher: a
deletion anomaly. Third, if a publisher’s address changes, it will be necessary to
update the rows for every book he has published: an update anomaly. Furthermore,
it will be very difficult to identify every book written by one author. The fact that a
book may have several authors means that the “author” field must be multivalued,
and a search will have to search all the values. Related to this is the problem of
having to restructure the table of a book that comes along with more authors than
the original design can handle. Also, the storage is very inefficient due to replication
of address details across rows, and the possibility of error as this data is repeatedly
entered is high. Normalization should solve all these issues.
The first normal form is to remove the repeating groups, in this case, the multiple
authors: pull them out into a separate table called AUTHORS. The data structures
will now look like the following.
Two rows in the BOOKS table:
ISBN TITLE PUBLISHER
12345 Oracle 11g OCP SQL Fundamentals McGraw-Hill, Spear Street,
1 Exam Guide San Francisco, California
67890 Oracle 11g New Features Exam Guide McGraw-Hill, Spear Street,
San Francisco, California
And three rows in the AUTHOR table:
NAME ISBN
John Watson 12345
Roopesh Ramklass 12345
Sam Alapati 67890
The one row in the BOOKS table is now linked to two rows in the AUTHORS
table. This solves the insertion anomaly (there is no reason not to insert as many
unpublished authors as necessary), the retrieval problem of identifying all the books
by one author (one can search the AUTHORS table on just one name) and the
problem of a fixed maximum number of authors for any one book (simply insert as
many or as few AUTHORS as are needed).
Understand Relational Structures 17
This is the first normal form: no repeating groups.
The second normal form removes columns from the table that are not dependent
on the primary key. In this example, that is the publisher’s address details: these
are dependent on the publisher, not the ISBN. The BOOKS table and a new
PUBLISHERS table will then look like this:
BOOKS
ISBN TITLE PUBLISHER
12345 Oracle 11g OCP SQL Fundamentals 1 Exam Guide McGraw-Hill
67890 Oracle 11g New Features Exam Guide McGraw-Hill
PUBLISHERS
PUBLISHER STREET CITY STATE
McGraw-Hill Spear Street San Francisco California
All the books published by one publisher will now point to a single record in
PUBLISHERS. This solves the problem of storing the address many times, and also
solves the consequent update anomalies and the data consistency errors caused by
inaccurate multiple entries.
Third normal form removes all columns that are interdependent. In the
PUBLISHERS table, this means the address columns: the street exists in only one
city, and the city can be in only one state; one column should do, not three. This
could be achieved by adding an address code, pointing to a separate address table:
PUBLISHERS
PUBLISHER ADDRESS CODE
McGraw-Hill 123
ADDRESSES
ADDRESS CODE STREET CITY STATE
123 Spear Street San Francisco California
18 Chapter 1: Oracle Server Technologies and the Relational Paradigm
One characteristic of normalized data that should be emphasized now is the use
of primary keys and foreign keys. A primary key is the unique identifier of a row
in a table, either one column or a concatenation of several columns (known as a
composite key). Every table should have a primary key defined. This is a requirement
of the relational paradigm. Note that the Oracle database deviates from this
standard: it is possible to define tables without a primary key—though it is usually
not a good idea, and some other RDBMSs do not permit this.
A foreign key is a column (or a concatenation of several columns) that can be
used to identify a related row in another table. A foreign key in one table will match
a primary key in another table. This is the basis of the many-to-one relationship. A
many-to-one relationship is a connection between two tables, where many rows in
one table refer to a single row in another table. This is sometimes called a parent-
child relationship: one parent can have many children. In the BOOKS example so
far, the keys are as follows:
TABLE KEYS
BOOKS Primary key: ISBN
Foreign key: Publisher
AUTHORS Primary key: Name + ISBN
Foreign key: ISBN
PUBLISHERS Primary key: Publisher
Foreign key: Address code
ADDRESSES Primary key: Address code
These keys define relationships such as that one book can have several authors.
There are various standards for documenting normalized data structures,
developed by different organizations as structured formal methods. Generally
speaking, it really doesn’t matter which method one uses as long as everyone reading
the documents understands it. Part of the documentation will always include a
listing of the attributes that make up each entity (also known as the columns that
make up each table) and an entity-relationship diagram representing graphically the
foreign to primary key connections. A widely used standard is as follows:
■ Primary key columns identified with a hash (#)
■ Foreign key columns identified with a back slash (\)
■ Mandatory columns (those that cannot be left empty) with an asterisk (*)
■ Optional columns with a lowercase “o”
Understand Relational Structures 19
The BOOKS tables can now be described as follows:
Table BOOKS
#* ISBN Primary key, required
o Title Optional
\* Publisher Foreign key, link to the PUBLISHERS table
Table AUTHORS
#* Name Together with the ISBN, the primary key
#\o ISBN Part of the primary key, and a foreign key to the BOOKS table.
Optional, because some authors may not yet be published.
Table PUBLISHERS
#* Publisher Primary key
\o Address code Foreign key, link to the ADDRESSES table
Table ADDRESSES
#* Address code Primary key
o Street
o City
o State
The second necessary part of documenting the normalized data model is the
entity-relationship diagram. This represents the connections between the tables
graphically. There are different standards for these; Figure 1-3 shows the entity-
relationship diagram for the BOOKS example using a very simple notation limited
to showing the direction of the one-to-many relationships, using what are often
called crow’s feet to indicate which sides of the relationship are the many and
the one. It can be seen that one BOOK can have multiple AUTHORS, one
PUBLISHER can publish many books. Note that the diagram also states that both
AUTHORS and PUBLISHERS have exactly one ADDRESS. More complex
notations can be used to show whether the link is required or optional, information
which will match that given in the table columns listed previously.
FIGURE 1-3 ADDRESSES
An entity-
relationship
AUTHORS BOOKS PUBLISHERS
diagram
20 Chapter 1: Oracle Server Technologies and the Relational Paradigm
This is a very simple example of normalization, and is not in fact complete. If
one author were to write several books, this would require multiple values in the
ISBN column of the AUTHORS table. That would be a repeating group, which
would have to be removed because repeating groups break the rule for first normal
form. A major exercise with data normalization is ensuring that the structures can
handle all possibilities.
A table in a real-world application may have hundreds of columns and dozens
of foreign keys. The standards for notation vary across organizations—the example
given is very basic. Entity-relationship diagrams for applications with hundreds or
thousands of entities can be challenging to interpret.
EXERCISE 1-2
Perform an Extended Relational Analysis
This is a paper-based exercise, with no specific solution.
Consider the situation where one author can write many books, and one book can
have many authors. This is a many-to-many relationship, which cannot be fit into
the relational model. Sketch out data structures that demonstrate the problem, and
develop another structure that would solve it. Following is a possible solution.
The un-normalized table of books with many authors could look like this:
BOOKS
#* Title
\* Authors
There could be two rows in this table:
Title Authors
11g SQL Fundamentals Exam Guide John Watson, Roopesh Ramklass
10g DBA Exam Guide John Watson, Damir Bersinic
And that of authors could look like this:
AUTHORS
#* Name
\* Books
Understand Relational Structures 21
There could be three rows in this table:
Name Books
John Watson 11g SQL Fundamentals Exam Guide, 10g DBA Exam Guide
Roopesh Ramklass 11g SQL Fundamentals Exam Guide
Damir Bersinic 10g DBA Exam Guide
This many-to-many relationship needs to be resolved into many-to-one
relationships by taking the repeating groups out of the two tables and storing them
in a separate books-per-author table. It will also become necessary to introduce
some codes, such as ISBNs to identify books and social security numbers to identify
authors. This is a possible normalized structure:
BOOKS
#* ISBN
o Title
AUTHORS
#* SSNO
o Name
BOOKAUTHORS
#\* ISBN Part of the primary key and a foreign key to BOOKS
#\* SSNO Part of the primary key and a foreign key to AUTHORS
The rows in these normalized tables would be as follows:
BOOKS
ISBN Title
12345 11g SQL Fundamentals Exam Guide
67890 DBA Exam Guide
22 Chapter 1: Oracle Server Technologies and the Relational Paradigm
AUTHORS
SSNO Name
11111 John Watson
22222 Damir Bersinic
33333 Roopesh Ramklass
BOOKAUTHORS
ISBN SSNO
12345 11111
12345 22222
67890 11111
67890 33333
Figure 1-4 shows the entity-relationship diagram for the original un-normalized
structure, followed by the normalized structure.
As a further exercise, consider the possibility that one publisher could have
offices at several addresses, and one address could have offices for several companies.
Authors will also have addresses, and this connection too needs to be defined. These
enhancements can be added to the example worked through previously.
FIGURE 1-4 First, an un-normalized many-to-many relationship:
BOOKS AUTHORS
Un-normalized
and normalized
data models The many-to-many relationship resolved, by interposing another entity:
BOOKS BOOKAUTHORS AUTHORS
Summarize the SQL Language 23
CERTIFICATION OBJECTIVE 1.03
Summarize the SQL Language
SQL is defined, developed, and controlled by international bodies. Oracle Corporation
does not have to conform to the SQL standard but chooses to do so. The language itself
can be thought as being very simple (there are only 16 commands), but in practice SQL
coding can be phenomenally complicated. That is why a whole book is needed to cover
the bare fundamentals.
SQL Standards
Structured Query Language (SQL) was first invented by an IBM research group in
the ’70s, but in fact Oracle Corporation (then trading as Relational Software, Inc.)
claims to have beaten IBM to market by a few weeks with the first commercial
implementation: Oracle 2, released in 1979. Since then the language has evolved
enormously and is no longer driven by any one organization. SQL is now an
international standard. It is managed by committees from ISO and ANSI. ISO is
the Organisation Internationale de Normalisation, based in Geneva; ANSI is the
American National Standards Institute, based in Washington, DC. The two bodies
cooperate, and their SQL standards are identical.
Earlier releases of the Oracle database used an implementation of SQL that had
some significant deviations from the standard. This was not because Oracle was
being deliberately different: it was usually because Oracle implemented features
that were ahead of the standard, and when the standard caught up, it used different
syntax. An example is the outer join (detailed in Chapter 8), which Oracle
implemented long before standard SQL; when standard SQL introduced an outer
join, Oracle added support for the new join syntax while retaining support for its
own proprietary syntax. Oracle Corporation ensures future compliance by inserting
personnel onto the various ISO and ANSI committees and is now assisting with
driving the SQL standard forward.
SQL Commands
These are the 16 SQL commands, separated into commonly used groups: