12 Chapter 1: Introduction to the Oracle Server Technologies
avoids these issues by formatting a report’s output as XML tags. Any client can
request an XML Publisher report and (provided it has an XML parser) display the
results. This is the key to distributing reports over wireless protocols to any device,
such as a cellular telephone.
Oracle Discoverer is an end-user tool for report generation. Oracle Reports and
XML Publisher need a programmer to design the report. A well-designed report
can be highly customizable by the end user through use of parameters supplied
at request time, but a programmer is still needed to design the report definition.
Oracle Discoverer empowers end users to develop reports themselves. Once Oracle
Discoverer, which runs on an Oracle Application Server middle tier, has been
appropriately configured, no more programmer input is needed: the end users do all
the development. Discoverer can add immense value for end users, while freeing up
programming staff for real development work.
The Oracle Applications
The number of Oracle applications products has increased substantially in recent
years due to a large number of corporate acquisitions, but two applications remain
predominant. The Oracle E-Business Suite is a comprehensive suite of applications
based around an accounting engine and Oracle Collaboration Suite is a set of office
automation tools.
The Oracle E-Business Suite, based around a core of financial applications,
includes facilities for accounting, human resources, manufacturing, customer
relationship management, customer services, and much more. All the components
share a common data model. The current release has a user interface written
with Oracle Developer Forms and Java, depending on which tool is most suitable
for the various modules and the expected users, running on Oracle Application
Server. There is a large amount of PL/SQL in the database to enable the business
functions. Future releases will merge the functionality of other products acquired
recently (such as the Siebel and Peoplesoft applications) into a common Java-
based interface.
The Oracle Collaboration Suite includes (among other things) servers for
e-mail, diary management, voicemail and fax, web conferencing, and (perhaps
most impressive) file serving. There is complete integration between the various
components. The applications run on Oracle Application Servers, and can be
accessed through a web interface from browsers or made available on mobile
wireless devices, such as cellular phones.
Explain Relational Structures 13
EXERCISE 1-1
Investigate DBMSs in Your Environment
This is a paper-based exercise, with no specific solution.
Identify the applications, application servers, and databases used in your environment.
Then, concentrating on the databases, try to get a feeling for how big and busy they
are. Consider the number of users, the volatility of the data, and the data volumes.
Finally, consider how critical they are to the organization: how much downtime or
data loss can be tolerated for each application and database? Is it possible to put a
financial figure on this?
The result of this study should give an idea of how critical the DBA’s role is.
CERTIFICATION OBJECTIVE 1.02
Explain Relational Structures
Critical to an understanding of SQL is an understanding of the relational paradigm,
and the ability to normalize data into relational structures. Normalization is the work
of systems analysts, as they model business data into a form suitable for storing in
relational tables. It is a science that can be studied for years, and there are many
schools of thought that have developed their own methods and notations.
Rows and Tables
Using the relational paradigm, data is stored in two-dimensional tables. A table
consists of a number of rows, each consisting of a set of columns. Within a table, all
the rows have the same column structure, though it is possible that in some rows
some columns may have nothing in them. An example of a table would be a list of
one’s employees, each employee being represented by one row. The columns might
be employee number, name, and a code for the department in which he/she works.
Any employees not currently assigned to a department would have that column
blank. Another table could represent the departments: one row per department,
with columns for the department’s code and the department’s name.
14 Chapter 1: Introduction to the Oracle Server Technologies
A note on terminology: what Oracle refers to as a table may also be called a
relation or an entity. Rows are sometimes called records or tuples, and columns may be
called attributes or fields. The number of rows in the table is the cardinality of the tuples.
Relational tables conform to certain rules that constrain and define the data.
At the column level, each column must be of a certain data type, such as numeric,
date-time, or character. The “character” data type is the most general, in that it
can accept anything. At the row level, usually each row must have some uniquely
identifying characteristic: this could be the value of one column, such as the
employee number and department number in the examples just given, that cannot
be repeated in different rows. There may also be rules that define links between the
tables, such as a rule that every employee must be assigned a department code that
can be matched to a row in the departments table. Following are examples of the
tabulated data definitions.
Departments table:
Column Name Description Data Type Length
DEPTNO Department number Numeric 2
DNAME Department name Character 14
Employees table:
Column Name Description Data Type Length
EMPNO Employee number Numeric 4
ENAME Employee name Character 10
DEPTNO Department number Numeric 2
The tables could contain the rows shown next.
Departments:
DEPTNO DNAME
10 ACCOUNTING
20 RESEARCH
30 SALES
40 OPERATIONS
Explain Relational Structures 15
Employees:
EMPNO ENAME DEPTNO
7369 SMITH 20
7499 ALLEN 30
7521 WARD 30
7566 JONES 20
7654 MARTIN 30
7698 BLAKE 30
7782 CLARK 10
7788 SCOTT 20
Looking at the tables, the two-dimensional structure is clear. Each row is of fixed
length, each column is of fixed length (padded with spaces when necessary), and the
rows are delimited with a new line. The rows have been stored in code order, but this
would be a matter of chance, not design; relational tables do not impose any particular
ordering on their rows. Department number 10 has one employee, and department
number 40 has none. Changes to data are usually very efficient with the relational
model. New employees can be appended to the employees table, or they can be moved
from one department to another simply by changing the DEPTNO value in their row.
Consider an alternative structure, where the data is stored according to the
hierarchical paradigm. The hierarchical model was developed before the relational
model, for technology reasons. In the early days of computing, storage devices lacked
the capability for maintaining the many separate files that were needed for the
many relational tables. Note that this problem is avoided in the Oracle database by
abstracting the physical storage (files) from the logical storage (tables); there is no
direct connection between tables and files, and certainly not a one-to-one mapping.
In effect, many tables can be stored in a very few files.
A hierarchical structure stores all related data in one unit. For example, the record
for a department would include all that department’s employees. The hierarchical
paradigm can be very fast and very space efficient. One file access may be all that
is needed to retrieve all the data needed to satisfy a query. The employees and
departments listed previously could be stored hierarchically as follows:
10,ACCOUNTING,7782,CLARK
20,RESEARCH,7369,SMITH,7566,JONES,7788,SCOTT
30,SALES,7499,ALLEN,7521,WARD,7654,MARTIN,7698,BLAKE
40,OPERATIONS
16 Chapter 1: Introduction to the Oracle Server Technologies
In this example layout, the rows and columns are of variable length. Columns
are delimited with a comma, rows with a new line. Data retrieval is typically
very efficient if the query can navigate the hierarchy: if one knows an employee’s
department, the employee can be found quickly. If one doesn’t, the retrieval may
be slow. Changes to data can be a problem if the change necessitates movement.
For example, to move employee 7566, JONES, from RESEARCH to SALES would
involve considerable effort on the part of the database because the move has to
be implemented as a removal from one line and an insertion into another. Note
that in this example, while it is possible to have a department with no employees
(the OPERATIONS department), it is absolutely impossible to have an employee
without a department: there is nowhere to put her.
The relational paradigm is highly efficient in many respects for many types of
data, but it is not appropriate for all applications. As a general rule, a relational
analysis should be the first approach taken when modeling a system. Only if it proves
inappropriate should one resort to non-relational structures. Applications where the
relational model has proven highly effective include virtually all online transaction
processing (OLTP) systems and decision support systems (DSSs). The relational
paradigm can be demanding in its hardware requirements and in the skill needed
to develop applications around it, but if the data fits, it has proved to be the most
versatile model. The problems arise from the need to maintain indexes that give the
versatility of access of maintain the links between tables, and the space requirements
of maintaining multiple copies of the indexed data in the indexes themselves and
in the tables in which the columns reside. Nonetheless, relational design is in most
circumstances the optimal model.
A number of software publishers have produced database management systems
that conform (with varying degrees of accuracy) to the relational paradigm; Oracle
is only one. IBM was perhaps the first company to commit major resources to it, but
its product (which later developed into DB2) was not ported to non-IBM platforms
for many years. Microsoft’s SQL Server is another relational database that has been
limited by the platforms on which it runs. Oracle databases, by contrast, have always
been ported to every major platform from the first release. It may be this that gave
Oracle the edge in the RDBMS market place.
A note on terminology: confusion can arise when discussing relational databases
with people used to working with Microsoft products. SQL is a language and SQL
Server is a database—but in the Microsoft world, the term “SQL” is often used to
refer to either.
Explain Relational Structures 17
Data Normalization
The process of modeling data into relational tables is known as normalization.
There are commonly said to be three levels of normalization: the first, second,
and third normal forms. There are higher levels of normalization: fourth and fifth
normal forms are well defined, but any normal data analyst (and certainly any
normal human being) will not need to be concerned with them. It is possible for a
SQL application to address un-normalized data, but this will usually be dreadfully
inefficient because that is not what the language is designed to do. In most cases,
data stored in a relational database and accessed with SQL should be normalized to
the third normal form.
As an example of normalization, consider a table called BOOKS storing details of
books, authors, and publishers, using the ISBN number as the primary key. A primary
key is the one attribute that can uniquely identify a record. These are two typical
entries:
ISBN Title Authors Publisher
12345 Oracle 11g SQL Fundamentals John Watson, McGraw-Hill, Spear
1 Exam Guide Roopesh Ramklass Street, San Francisco,
CA
67890 Oracle 11g New Features Sam Alapati McGraw-Hill, Spear
Exam Guide Street, San Francisco,
CA
Storing the data in this table gives rise to several anomalies. First, here is the
insertion anomaly: it is impossible to enter details of authors who are not yet
published, because there will be no ISBN number under which to store them.
Second, a book cannot be deleted without losing the details of the publisher:
a deletion anomaly. Third, if a publisher’s address changes, it will be necessary
to update the rows for every book he/she has published: an update anomaly.
Furthermore, it will be very difficult to identify every book written by one author.
The fact that a book may have several authors means that the “author” field must
be multivalued, and a search will have to search all the values. Related to this is
the problem of having to restructure the table of a book comes along with more
authors tan the original design can handle. Also, the storage is very inefficient due
to replication of address details across rows, and the possibility of error as this data is
repeatedly entered is high. Normalization should solve all these issues.
18 Chapter 1: Introduction to the Oracle Server Technologies
The first normal form is to remove the repeating groups. In this case, the multiple
authors: pull them out into a separate table called AUTHORS. The data structures
will now look like this:
BOOKS
ISBN Title Publisher
12345 Oracle 11g SQL Fundamentals McGraw-Hill, Spear Street,
1 Exam Guide San Francisco, CA
67890 Oracle 11g New Features McGraw-Hill, Spear Street,
Exam Guide San Francisco, CA
AUTHORS
Name ISBN
John Watson 12345
Roopesh Ramklass 12345
Sam Alapati 67890
One row in the BOOKS table is now linked to two rows in the AUTHORS
table. This solves the insertion anomaly (there is no reason not to insert as many
unpublished authors as necessary), the retrieval problem of identifying all the books
by one author (one can search the AUTHORS table on her name), and the problem
of a fixed maximum number of authors for any one book (simply insert as many or as
few AUTHORS as are needed).
This is the first normal form: no repeating groups.
The second normal form removes columns from the table that are not dependent
on the primary key. In this example, that is the publisher’s address details: these
depend on the publisher, not the ISBN. The BOOKS table and a new PUBLISHERS
table will then look like this:
BOOKS
ISBN Title Publisher
12345 Oracle 11g OCP SQL Fundamentals 1 Exam Guide McGraw-Hill
67890 Oracle 11g New Features Exam Guide McGraw-Hill
Explain Relational Structures 19
PUBLISHERS
Publisher Street City State
McGraw-Hill Spear Street San Francisco California
All the books published by one publisher will now point to a single record in
PUBLISHERS. This solves the problem of storing the address many times, and
the consequent update anomalies and also the data consistency errors caused by
inaccurate multiple entries.
Third normal form removes all columns that are interdependent. In the
PUBLISHERS table, this means the address columns: the street exists in only one
city, and the city can be in only one state; one column should do, not three. This
could be achieved by adding an address code, pointing to a separate address table:
PUBLISHERS
Publisher Address Code
McGraw-Hill 123
ADDRESSES
Address Code Street City State
123 Spear Street San Francisco California
One characteristic of normalized data that should be emphasized now is the use
of primary keys and foreign keys. A primary key is the unique identifier of a row
in a table, either one column or a concatenation of several columns (known as a
composite key). Every table should have a primary key defined.
Note that the Oracle database deviates from this standard: it is possible to define
tables without a primary key—though this is usually not a good idea, and some other
RDBMSs do not permit it.
A foreign key is a column (or a concatenation of several columns) that can be
used to identify a related row in another table. A foreign key in one table will match
a primary key in another table. This is the basis of the many-to-one relationship. A
many-to-one relationship is a connection between two tables, where many rows in
one table refer to a single row in another table. This is sometimes called a parent-child
20 Chapter 1: Introduction to the Oracle Server Technologies
relationship: one parent can have many children. In the books example so far, the keys
are as follows:
Table Keys
BOOKS Primary key: ISBN
Foreign key: Publisher
AUTHORS Primary key: Name + ISBN
Foreign key: ISBN
PUBLISHERS Primary key: Publisher
Foreign key: Address code
ADDRESSES Primary key: Address code
These keys define relationships such as that one book can have several authors.
There are various standards for documenting normalized data structures, developed
by different organizations as structured formal methods. Generally speaking, it really
doesn’t matter which method one uses as long everyone reading the documents
understands it. Part of the documentation will always include a listing of the attributes
that make up each entity (also known as the columns that make up each table) and
an entity-relationship diagram representing graphically the foreign-to-primary key
connections. A widely used standard is that primary keys columns should be identified
with a hash (#); foreign key columns with a backslash (\); mandatory columns (that
cannot be left empty) with an asterisk (*); optional columns with a lowercase o. The
books tables can now be described as follows:
Table BOOKS
#* ISBN Primary key, required
o Title Optional
\* Publisher Foreign key, link to the PUBLISHERS table
Table AUTHORS
#* Name Together with the ISBN, the primary key
#\o ISBN Part of the primary key, and a foreign key to the
BOOKS table Optional, because some authors
may not yet be published
Explain Relational Structures 21
Table PUBLISHERS
#* Publisher Primary key
\o Address code foreign key, link to the ADDRESSES table
Table ADDRESSES
#* Address code Primary key
o Street
o City
o State
The second necessary part of documenting the normalized data model is the
entity-relationship diagram (ERD). This represents the connections between the tables
graphically. There are different standards for these; Figure 1-1 shows the entity-
relationship diagram for the books example using a very simple notation limited
to showing the direction of the one-to-many relationships. It can be seen that one
BOOK can have multiple AUTHORS, one PUBLISHER can publish many books,
and so on. More complex notations can be used to show whether the link is required
or optional, information which will match that given in the table columns listings
previously.
This is a very simple example of normalization and is not in fact complete. If one
author were to write several books, this would require multiple values in the ISBN
column of the AUTHORS table. That would be a repeating group, which would
have to be removed because repeating groups break the rule for first normal form. A
major exercise with data normalization is ensuring that the structures can handle all
possibilities. Tables in a real-world application may have hundreds of columns and
dozens of foreign keys.
Errors in relational analysis can be disastrous for an application. It is very
difficult (and expensive) to correct any errors later. By contrast, errors
made during the programming stage of development can usually be fixed
comparatively quickly and cheaply.
FIGURE 1-1 An entity-relationship diagram, showing basic one-to-many relationships
AUTHORS BOOKS PUBLISHERS ADDRESSES