Database Tuning using indexes
-
Indexing in Oracle
In Oracle, several kinds of (clustered) indexing are supported. This page briefly explains the SQL DDL statements that can be used to setup (clustered) indexes.
To create a B+ unclustered index, use the following statement:
CREATE INDEX index name ON table name ( table attribute(s) );
Every indexing structure has to be given a name. The attributes are attributes in the given table. After executing this statement, Oracle has created a B+ tree which allows for a quick search on the given attributes. Both equality and small inequality (range) searches become faster by doing this. Insertion of new values in the table becomes more expensive, however.
As is also discussed in the book, range searches are performed more efficiently on tables that are stored themselves into an indexing structure (the so-called clustering indexes). Oracle supports this using the following chain of commands:
CREATE CLUSTER cluster name ( attribute type, ... );
CREATE INDEX index name ON cluster name;
CREATE TABLE table name ( usual parameters) CLUSTER cluster name ( attribute name );
The reason for this complexity is that Oracle allows several tables to be stored in the same cluster. We will however not use this advanced feature. Searches are very fast on indexing clusters. Every table can only be in one cluster, however.
When a table is created, usually this table is not put into a structured index. If a primary key is defined, one ordinary index is created on the attributes in the primary key. It is therefore never necessary to create an ordinary index on the primary key. If one wants a structured index, one has to explicitely define this using the commands given above.
By default, all the commands given above define a B+ index tree. Also hash indexes are supported by Oracle, but only for clustered indexes. To create a hashed, clustered index, follow these steps:
CREATE CLUSTER cluster name ( attribute type, ... ) HASHKEYS number;
CREATE TABLE table name ( usual parameters) CLUSTER cluster name ( attribute name );
A table which is stored in a hash structure, allows for fast equality searches only. It is more efficient than B+ trees for such searches. The larger the `number' is, the more efficient the search is. The memory usage of the index increases however quickly for large numbers. It is not required to create an index separately. This is done automatically for hash clusters.
Assignment:
Consider the following huge database sampletable50000.sql. Load this one of these databases in Oracle, and now consider the following query:
SELECT E.esalary, AVG(E.eage)
FROM Employee2 E
WHERE E.esalary > 1000
GROUP BY E.esalary
Execute this query and notice the time how long it takes to complete this query. Create a suitable index for the database such that this query can be executed faster (hint: use the command set timing on to record the execution time of your query).
-
Your task is to tune databases. The database system offers the following types of indexes for tables:
- Heap file
- Clustered B+ Tree
- Unclustered B+ Tree
- Clustered Hash Index
Select an appropriate type of data storage/organization for the following cases and give a reason for your choice!
- A table with about 5 rows and 7 columns, containing integer numbers each, needs to be stored. The size of the table will hardly grow in the future.
- A telephone book needs to be stored on a CD-ROM Database. It contains the addresses of all people in the Netherlands. For queries the first and last ame of the person or only the last name of the person needs to be entered and the system should give a quick response. Selections by address and telephone number are not allowed.
- In a demographic database for the ministry of work, the job affiliation of all people of South-Holland should be recorded. Often the ministry will enter queries about jobs of people in a certain age range or income range. A clustered B+ tree index has already been created previously on the SOFI number of the people registered in the database. Suggest additional indexes for frequent queries!
- The biology department of the University of Leiden wants to collect information about observations of endangered animal and plant species in Leiden and its surrounding lakes, polders, and dunes. For each observation GPS coordinates, name of the observer, and a date is recorded. A rough estimation says that there will be no more than 10 observations per species encountered. The total number of observations can be very high, however. Suggest an index that supports queries that find all observations for a specific species (identified by latin name)!
-
Which of the index types Clustered Hash Index, Clustered B+Tree Index, Unclustered B+ Tree,Heap File, Sorted file are suitable in the following scenarios. Justify your answer briefly!
- A university database contains a small table of faculty buildings with their address. The number of buildings is about 10 and cannot be assumed that the number of buildings will increase rapidly the next decades. Choose an appropriate file organization and/or index if there are often queries on the building names.
- A huge database is build for trekking equipment. There is already a clustered index on the scan-code number of the products. How can queries that combine the product category with price intervals be supported by means of an index?
- A large mineral database is build to collect information about the mining places for different sorts of rare gems and jewels. Though there are various types of gems and jewels (all indicated by a name) each one of them can be found only in a few (1-maximally 10) places. Often the name of a gem is the search key for a query.
- A list of all papers of a conference is stored on a DVD. For data-protection reasons the only way to query the documents is by means of the document-id that is given in the hardcopy of the proceedings handed out to the members of the conference. The number of papers on the DVD is huge.
