OCA Oracle Database 11g Administration I Exam Guide P2

12 Chapter 1: Introduction to the Oracle Server Technologies avoids these issues by formatting a report’s output as XML tags. Any client can request an XML Publisher report and (provided it has an XML parser) display the results. This is the key to distributing reports over wireless protocols to any device, such as a cellular telephone. Oracle Discoverer is an end-user tool for report generation. Oracle Reports and XML Publisher need a programmer to design the report. A well-designed report can be highly customizable by the end user through use of parameters supplied at request time, but a programmer is still needed to design the report definition. Oracle Discoverer empowers end users to develop reports themselves. Once Oracle Discoverer, which runs on an Oracle Application Server middle tier, has been appropriately configured, no more programmer input is needed: the end users do all the development. Discoverer can add immense value for end users, while freeing up programming staff for real development work. The Oracle Applications The number of Oracle applications products has increased substantially in recent years due to a large number of corporate acquisitions, but two applications remain predominant. The Oracle E-Business Suite is a comprehensive suite of applications based around an accounting engine and Oracle Collaboration Suite is a set of office automation tools. The Oracle E-Business Suite, based around a core of financial applications, includes facilities for accounting, human resources, manufacturing, customer relationship management, customer services, and much more. All the components share a common data model. The current release has a user interface written with Oracle Developer Forms and Java, depending on which tool is most suitable for the various modules and the expected users, running on Oracle Application Server. There is a large amount of PL/SQL in the database to enable the business functions. Future releases will merge the functionality of other products acquired recently (such as the Siebel and Peoplesoft applications) into a common Java- based interface. The Oracle Collaboration Suite includes (among other things) servers for e-mail, diary management, voicemail and fax, web conferencing, and (perhaps most impressive) file serving. There is complete integration between the various components. The applications run on Oracle Application Servers, and can be accessed through a web interface from browsers or made available on mobile wireless devices, such as cellular phones. Explain Relational Structures 13 EXERCISE 1-1 Investigate DBMSs in Your Environment This is a paper-based exercise, with no specific solution. Identify the applications, application servers, and databases used in your environment. Then, concentrating on the databases, try to get a feeling for how big and busy they are. Consider the number of users, the volatility of the data, and the data volumes. Finally, consider how critical they are to the organization: how much downtime or data loss can be tolerated for each application and database? Is it possible to put a financial figure on this? The result of this study should give an idea of how critical the DBA’s role is. CERTIFICATION OBJECTIVE 1.02 Explain Relational Structures Critical to an understanding of SQL is an understanding of the relational paradigm, and the ability to normalize data into relational structures. Normalization is the work of systems analysts, as they model business data into a form suitable for storing in relational tables. It is a science that can be studied for years, and there are many schools of thought that have developed their own methods and notations. Rows and Tables Using the relational paradigm, data is stored in two-dimensional tables. A table consists of a number of rows, each consisting of a set of columns. Within a table, all the rows have the same column structure, though it is possible that in some rows some columns may have nothing in them. An example of a table would be a list of one’s employees, each employee being represented by one row. The columns might be employee number, name, and a code for the department in which he/she works. Any employees not currently assigned to a department would have that column blank. Another table could represent the departments: one row per department, with columns for the department’s code and the department’s name. 14 Chapter 1: Introduction to the Oracle Server Technologies A note on terminology: what Oracle refers to as a table may also be called a relation or an entity. Rows are sometimes called records or tuples, and columns may be called attributes or fields. The number of rows in the table is the cardinality of the tuples. Relational tables conform to certain rules that constrain and define the data. At the column level, each column must be of a certain data type, such as numeric, date-time, or character. The “character” data type is the most general, in that it can accept anything. At the row level, usually each row must have some uniquely identifying characteristic: this could be the value of one column, such as the employee number and department number in the examples just given, that cannot be repeated in different rows. There may also be rules that define links between the tables, such as a rule that every employee must be assigned a department code that can be matched to a row in the departments table. Following are examples of the tabulated data definitions. Departments table: Column Name Description Data Type Length DEPTNO Department number Numeric 2 DNAME Department name Character 14 Employees table: Column Name Description Data Type Length EMPNO Employee number Numeric 4 ENAME Employee name Character 10 DEPTNO Department number Numeric 2 The tables could contain the rows shown next. Departments: DEPTNO DNAME 10 ACCOUNTING 20 RESEARCH 30 SALES 40 OPERATIONS Explain Relational Structures 15 Employees: EMPNO ENAME DEPTNO 7369 SMITH 20 7499 ALLEN 30 7521 WARD 30 7566 JONES 20 7654 MARTIN 30 7698 BLAKE 30 7782 CLARK 10 7788 SCOTT 20 Looking at the tables, the two-dimensional structure is clear. Each row is of fixed length, each column is of fixed length (padded with spaces when necessary), and the rows are delimited with a new line. The rows have been stored in code order, but this would be a matter of chance, not design; relational tables do not impose any particular ordering on their rows. Department number 10 has one employee, and department number 40 has none. Changes to data are usually very efficient with the relational model. New employees can be appended to the employees table, or they can be moved from one department to another simply by changing the DEPTNO value in their row. Consider an alternative structure, where the data is stored according to the hierarchical paradigm. The hierarchical model was developed before the relational model, for technology reasons. In the early days of computing, storage devices lacked the capability for maintaining the many separate files that were needed for the many relational tables. Note that this problem is avoided in the Oracle database by abstracting the physical storage (files) from the logical storage (tables); there is no direct connection between tables and files, and certainly not a one-to-one mapping. In effect, many tables can be stored in a very few files. A hierarchical structure stores all related data in one unit. For example, the record for a department would include all that department’s employees. The hierarchical paradigm can be very fast and very space efficient. One file access may be all that is needed to retrieve all the data needed to satisfy a query. The employees and departments listed previously could be stored hierarchically as follows: 10,ACCOUNTING,7782,CLARK 20,RESEARCH,7369,SMITH,7566,JONES,7788,SCOTT 30,SALES,7499,ALLEN,7521,WARD,7654,MARTIN,7698,BLAKE 40,OPERATIONS 16 Chapter 1: Introduction to the Oracle Server Technologies In this example layout, the rows and columns are of variable length. Columns are delimited with a comma, rows with a new line. Data retrieval is typically very efficient if the query can navigate the hierarchy: if one knows an employee’s department, the employee can be found quickly. If one doesn’t, the retrieval may be slow. Changes to data can be a problem if the change necessitates movement. For example, to move employee 7566, JONES, from RESEARCH to SALES would involve considerable effort on the part of the database because the move has to be implemented as a removal from one line and an insertion into another. Note that in this example, while it is possible to have a department with no employees (the OPERATIONS department), it is absolutely impossible to have an employee without a department: there is nowhere to put her. The relational paradigm is highly efficient in many respects for many types of data, but it is not appropriate for all applications. As a general rule, a relational analysis should be the first approach taken when modeling a system. Only if it proves inappropriate should one resort to non-relational structures. Applications where the relational model has proven highly effective include virtually all online transaction processing (OLTP) systems and decision support systems (DSSs). The relational paradigm can be demanding in its hardware requirements and in the skill needed to develop applications around it, but if the data fits, it has proved to be the most versatile model. The problems arise from the need to maintain indexes that give the versatility of access of maintain the links between tables, and the space requirements of maintaining multiple copies of the indexed data in the indexes themselves and in the tables in which the columns reside. Nonetheless, relational design is in most circumstances the optimal model. A number of software publishers have produced database management systems that conform (with varying degrees of accuracy) to the relational paradigm; Oracle is only one. IBM was perhaps the first company to commit major resources to it, but its product (which later developed into DB2) was not ported to non-IBM platforms for many years. Microsoft’s SQL Server is another relational database that has been limited by the platforms on which it runs. Oracle databases, by contrast, have always been ported to every major platform from the first release. It may be this that gave Oracle the edge in the RDBMS market place. A note on terminology: confusion can arise when discussing relational databases with people used to working with Microsoft products. SQL is a language and SQL Server is a database—but in the Microsoft world, the term “SQL” is often used to refer to either. Explain Relational Structures 17 Data Normalization The process of modeling data into relational tables is known as normalization. There are commonly said to be three levels of normalization: the first, second, and third normal forms. There are higher levels of normalization: fourth and fifth normal forms are well defined, but any normal data analyst (and certainly any normal human being) will not need to be concerned with them. It is possible for a SQL application to address un-normalized data, but this will usually be dreadfully inefficient because that is not what the language is designed to do. In most cases, data stored in a relational database and accessed with SQL should be normalized to the third normal form. As an example of normalization, consider a table called BOOKS storing details of books, authors, and publishers, using the ISBN number as the primary key. A primary key is the one attribute that can uniquely identify a record. These are two typical entries: ISBN Title Authors Publisher 12345 Oracle 11g SQL Fundamentals John Watson, McGraw-Hill, Spear 1 Exam Guide Roopesh Ramklass Street, San Francisco, CA 67890 Oracle 11g New Features Sam Alapati McGraw-Hill, Spear Exam Guide Street, San Francisco, CA Storing the data in this table gives rise to several anomalies. First, here is the insertion anomaly: it is impossible to enter details of authors who are not yet published, because there will be no ISBN number under which to store them. Second, a book cannot be deleted without losing the details of the publisher: a deletion anomaly. Third, if a publisher’s address changes, it will be necessary to update the rows for every book he/she has published: an update anomaly. Furthermore, it will be very difficult to identify every book written by one author. The fact that a book may have several authors means that the “author” field must be multivalued, and a search will have to search all the values. Related to this is the problem of having to restructure the table of a book comes along with more authors tan the original design can handle. Also, the storage is very inefficient due to replication of address details across rows, and the possibility of error as this data is repeatedly entered is high. Normalization should solve all these issues. 18 Chapter 1: Introduction to the Oracle Server Technologies The first normal form is to remove the repeating groups. In this case, the multiple authors: pull them out into a separate table called AUTHORS. The data structures will now look like this: BOOKS ISBN Title Publisher 12345 Oracle 11g SQL Fundamentals McGraw-Hill, Spear Street, 1 Exam Guide San Francisco, CA 67890 Oracle 11g New Features McGraw-Hill, Spear Street, Exam Guide San Francisco, CA AUTHORS Name ISBN John Watson 12345 Roopesh Ramklass 12345 Sam Alapati 67890 One row in the BOOKS table is now linked to two rows in the AUTHORS table. This solves the insertion anomaly (there is no reason not to insert as many unpublished authors as necessary), the retrieval problem of identifying all the books by one author (one can search the AUTHORS table on her name), and the problem of a fixed maximum number of authors for any one book (simply insert as many or as few AUTHORS as are needed). This is the first normal form: no repeating groups. The second normal form removes columns from the table that are not dependent on the primary key. In this example, that is the publisher’s address details: these depend on the publisher, not the ISBN. The BOOKS table and a new PUBLISHERS table will then look like this: BOOKS ISBN Title Publisher 12345 Oracle 11g OCP SQL Fundamentals 1 Exam Guide McGraw-Hill 67890 Oracle 11g New Features Exam Guide McGraw-Hill Explain Relational Structures 19 PUBLISHERS Publisher Street City State McGraw-Hill Spear Street San Francisco California All the books published by one publisher will now point to a single record in PUBLISHERS. This solves the problem of storing the address many times, and the consequent update anomalies and also the data consistency errors caused by inaccurate multiple entries. Third normal form removes all columns that are interdependent. In the PUBLISHERS table, this means the address columns: the street exists in only one city, and the city can be in only one state; one column should do, not three. This could be achieved by adding an address code, pointing to a separate address table: PUBLISHERS Publisher Address Code McGraw-Hill 123 ADDRESSES Address Code Street City State 123 Spear Street San Francisco California One characteristic of normalized data that should be emphasized now is the use of primary keys and foreign keys. A primary key is the unique identifier of a row in a table, either one column or a concatenation of several columns (known as a composite key). Every table should have a primary key defined. Note that the Oracle database deviates from this standard: it is possible to define tables without a primary key—though this is usually not a good idea, and some other RDBMSs do not permit it. A foreign key is a column (or a concatenation of several columns) that can be used to identify a related row in another table. A foreign key in one table will match a primary key in another table. This is the basis of the many-to-one relationship. A many-to-one relationship is a connection between two tables, where many rows in one table refer to a single row in another table. This is sometimes called a parent-child 20 Chapter 1: Introduction to the Oracle Server Technologies relationship: one parent can have many children. In the books example so far, the keys are as follows: Table Keys BOOKS Primary key: ISBN Foreign key: Publisher AUTHORS Primary key: Name + ISBN Foreign key: ISBN PUBLISHERS Primary key: Publisher Foreign key: Address code ADDRESSES Primary key: Address code These keys define relationships such as that one book can have several authors. There are various standards for documenting normalized data structures, developed by different organizations as structured formal methods. Generally speaking, it really doesn’t matter which method one uses as long everyone reading the documents understands it. Part of the documentation will always include a listing of the attributes that make up each entity (also known as the columns that make up each table) and an entity-relationship diagram representing graphically the foreign-to-primary key connections. A widely used standard is that primary keys columns should be identified with a hash (#); foreign key columns with a backslash (\); mandatory columns (that cannot be left empty) with an asterisk (*); optional columns with a lowercase o. The books tables can now be described as follows: Table BOOKS #* ISBN Primary key, required o Title Optional \* Publisher Foreign key, link to the PUBLISHERS table Table AUTHORS #* Name Together with the ISBN, the primary key #\o ISBN Part of the primary key, and a foreign key to the BOOKS table Optional, because some authors may not yet be published Explain Relational Structures 21 Table PUBLISHERS #* Publisher Primary key \o Address code foreign key, link to the ADDRESSES table Table ADDRESSES #* Address code Primary key o Street o City o State The second necessary part of documenting the normalized data model is the entity-relationship diagram (ERD). This represents the connections between the tables graphically. There are different standards for these; Figure 1-1 shows the entity- relationship diagram for the books example using a very simple notation limited to showing the direction of the one-to-many relationships. It can be seen that one BOOK can have multiple AUTHORS, one PUBLISHER can publish many books, and so on. More complex notations can be used to show whether the link is required or optional, information which will match that given in the table columns listings previously. This is a very simple example of normalization and is not in fact complete. If one author were to write several books, this would require multiple values in the ISBN column of the AUTHORS table. That would be a repeating group, which would have to be removed because repeating groups break the rule for first normal form. A major exercise with data normalization is ensuring that the structures can handle all possibilities. Tables in a real-world application may have hundreds of columns and dozens of foreign keys. Errors in relational analysis can be disastrous for an application. It is very difficult (and expensive) to correct any errors later. By contrast, errors made during the programming stage of development can usually be fixed comparatively quickly and cheaply. FIGURE 1-1 An entity-relationship diagram, showing basic one-to-many relationships AUTHORS BOOKS PUBLISHERS ADDRESSES

Tải về miễn phí