Tuesday, October 28, 2008

All About Indexes in Oracle

All About Indexes in Oracle
By Gavin JT Powell (BSc, OCP)
October 8th, 2002
What is an Index?
An index is used to increase read access performance. A book, having an index, allows rapid access to a particular subject area within that book. Indexing a database table provides rapid location of specific rows within that table, where indexes are used to optimize the speed of access to rows. When indexes are not used or are not matched by SQL statements submitted to that database then a full table scan is executed. A full table scan will read all the data in a table to find a specific row or set of rows, this is extremely inefficient when there are many rows in the table.
 It is often more efficient to full table scan small tables. The optimizer will often assess full table scan on small tables as being more efficient than reading both index and data space, particularly where a range scan rather than an exact match would be used against the index.
An index of columns on a table contains a one-to-one ratio of rows between index and indexed table, excluding binary key groupings, more on this later. An index is effectively a separate table to that of the data table. Tables and indexes are often referred to as data and index spaces. An index contains the indexed columns plus a ROWID value for each of those column combination rows. When an index is searched through the indexed columns rather than all the data in the row of a table is scanned. The index space ROWID is then used to access the table row directly in the data space. An index row is generally much smaller than a table row, thus more index rows are stored in the same physical space, a block. As a result less of the database is accessed when using indexes as opposed to tables to search for data. This is the reason why indexes enhance performance.
The Basic “How to” of Indexing
There are a number of important factors with respect to efficient and effective creation and use of indexing.
 The number of indexes per table.
 The number of table columns to be indexed.
 What datatypes are sensible for use in columns to be indexed?
 Types of indexes from numerous forms of indexes available.
 How does SQL behave with indexes?
 What should be indexed?
 What should not be indexed?
Number of Indexes per Table
Whenever a table is inserted into, updated or deleted from, all indexes plus the table must be updated. Thus if one places ten indexes onto a single table then every change to that table requires an effective change to a single table and ten indexes. The result is that performance will be substantially degraded since one insert requires eleven inserts to insert the new row into both data and index spaces. Be frugal with indexing and be conscious of the potential ill as well as the good effects produced by indexing. The general rule is that the more dynamic a table is the fewer indexes it should have.
 A dynamic table is a table changes constantly, such as a transactions table. Catalog tables on the other hand store information such as customer details; customers change a lot less often than invoices. Customer details are thus static in nature and over-indexing may be advantageous to performance.
Number of Columns to Index
Composite indexes are indexes made up of multiple columns. Minimize on the number of columns in a composite key. Create indexes with single columns. Composite indexes are often a requirement of traditional relational database table structures.
 With the advent of object-oriented application programming languages such as Java, sequence identifiers tend to be used to identify every row in every table uniquely. The result is single column indexes for every table. The only exceptions are generally many-to-many join resolution entities.
It may sometimes be better to exclude some of the lower-level or less relevant columns from the index since at that level there may not be much data, if there are not many rows to index it can be more efficient to read a group of rows from the data space. For instance, a composite index comprised of five columns could be reduced to the first three columns based on a limited number of rows traversed as a result of ignoring the last two columns. Look at your data carefully when constructing indexes. The more columns you add to a composite index the slower the search will be since there is a more complex requirement for that search and the indexes get physically larger. The benefit of indexes is that an index occupies less physical space than the data. If the index gets so large that it is as large as the data then it will become less efficient to read both the index and data spaces rather than just the data space.
 Most database experts recommend a maximum of three columns for composite keys.
Datatypes of Index Columns
Integers make the most efficient indexes. Try to always create indexes on columns with fixed length values. Avoid using VARCHAR2 and any object data types. Use integers if possible or fixed length, short strings. Also try to avaoid indexing on dates and floating-point values. If using dates be sure to use the internal representation or just the date, not the date and the time. Use integer generating sequences wherever possible to create consistently sequential values.
Types of Indexes
There are different types of indexes available in different databases. These different indexes are applicable under specific circumstances, generally for specific search patterns, for instance exact matches or range matches.
The simplest form of indexing is no index at all, a heap structure. A heap structure is effectively a collection of data units, rows, which is completely unordered. The most commonly used indexed structure is a Btree (Binary Tree). A Btree index is best used for exact matches and range searches. Other methods of indexing exist.
1. Hashing algorithms produce a pre-calculated best guess on general row location and are best used for exact matches.
2. ISAM or Indexed Sequential Access Method indexes are not used in Oracle.
3. Bitmaps contain maps of zero’s and 1’s and can be highly efficient access methods for read-only data.
4. There are other types of indexing which involve clustering of data with indexes.
In general every index type other than a Btree involves overflow. When an index is required to overflow it means that the index itself cannot be changed when rows are added, changed or removed. The result is inefficiency because a search to find overflowing data involves a search through originally indexed rows plus overflowing rows. Overflow index space is normally not ordered. A Btree index can be altered by changes to data. The only exception to a Btree index coping with data changes in Oracle is deletion of rows. When rows are deleted from a table, physical space previously used by the index for the deleted row is never reclaimed unless the index is rebuilt. Rebuilding of Btree indexes is far less common than that for other types of indexes since non-Btree indexes simply overflow when row changes are applied to them.
Oracle uses has the following types of indexing available.
 Btree index. A Btree is a binary tree. General all-round index and common in OLTP systems. An Oracle Btree index has three layers, the first two are branch node layers and the third, the lowest, contains leaf nodes. The branch nodes contain pointers to the lower level branch or leaf node. Leaf nodes contain index column values plus a ROWID pointer to the table row. The branch and leaf nodes are optimally arranged in the tree such that each branch will contain an equal number of branch or leaf nodes.
 Bitmap index. Bitmap containing binary representations for each row. A zero implies that a row does not have a specified value and a 1 denotes that row having that value. Bitmaps are very susceptible to overflow in OLTP systems and should only be used for read-only data such as in Data Warehouses.
 Function-Based index. Contains the result of an expression pre-calculated on each row in a table.
 Index Organized Tables. Clusters index and data spaces together physically for a single table and orders the merged physical space in the order of the index, usually the primary key. An index organized table is a table as well as an index, the two are merged.
 Clusters. Partial merge of index and data spaces, ordered by an index, not necessarily the primary key. A cluster is similar to an index organized table except that it can be built on a join (more than a single table). Clusters can be ordered using binary tree structures or hashing algorithms. A cluster could also be viewed as a table as well as an index since clustering partially merges index and data spaces.
 Bitmap Join index. Creates a single bitmap for one table in a join.
 Domain index. Specific to certain application types using contextual or spatial data, amongst others.
Indexing Attributes
Various types of indexes can have specific attributes or behaviors applied to them. These behaviors are listed below, some are Oracle specific and some are not.
 Ascending or Descending. Indexes can be order in either way.
 Uniqueness. Indexes can be unique or non-unique. Primary keys must be unique since a primary key uniquely identifies a row in a table referentially. Other columns such as names sometimes have unique constraints or indexes, or both, added to them.
 Composites. A composite index is an index made up of more than one column in a table.
 Compression. Applies to Btree indexes where duplicated prefix values are removed. Compression speeds up data retrieval but can slow down table changes.
 Reverse keys. Bytes for all columns in the index are reversed, retaining the order of the columns. Reverse keys can help performance in clustered server environments (Oracle8i Parallel Server / RAC Oracle9i) by ensuring that changes to similar key values will be better physically spread. Reverse key indexing can apply to rows inserted into OLTP tables using sequence integer generators, where each number is very close to the previous number. When searching for and updating rows with sequence identifiers, where rows are searched for
 Null values. Null values are generally not included in indexes.
 Sorting (NOSORT). This option is Oracle specific and does not sort an index. This assumes that data space is physically ordered in the desired manner.
What SQL does with Indexes
In general a SQL statement will attempt to match the structure of itself to an index, the where clause ordering will attempt to match available indexes and use them if possible. If no index is matched then a full table scan will be executed. A table scan is extremely inefficient for anything but the smallest of tables. Obviously if a table is read sequentially, in physical order then an index is not required. A table does not always need an index.
What to Index
Use indexes where frequent queries are performed with where and order by clause matching the ordering of columns in those indexes. Use indexing generally on larger tables or multi-table, complex joins. Indexes are best created in the situations listed below.
 Columns used in joins.
 Columns used in where clauses.
 Columns used in order by clauses.
 In most relational databases the order-by clause is generally executed on the subset retrieved by the where clause, not the entire data space. This is not always unfortunately the case for Oracle.
 Traditionally the order-by clause should never include the columns contained in the where cause. The only case where the order-by clause will include columns contained in the where clause is the case of the where clause not matching any index in the database or a requirement for the order by clause to override the sort order of the where, typically in highly complex, multi-table joins.
 The group-by clause can be enhanced by indexing when the range of values being grouped is small in relation to the number of rows in the table selected.
What not to Index
Indexes will degrade performance of inserts, updates and deletes, sometimes substantially.
 Tables with a small number of rows.
 Static tables.
 Columns with a wide range of values.
 Tables changed frequently and with a low amount of data retrieval.
 Columns not used in data access query select statements.
Tuning Oracle SQL Code and Using Indexes
What is SQL Tuning?
Tune SQL based on the nature of your application, OLTP or read-only Data Warehouse. OLTP applications have high volumes of concurrent transactions and are better served with exact match SQL where many transactions compete for small amounts of data. Read-only Data Warehouses require rapid access to large amounts of information at once and thus many records are accessed at once, either by many or a small number of sessions.
The EXPLAIN PLAN command can be used to compare different versions of SQL statements, and tune your application SQL code as required. When tuning OLTP applications utilize sharing of SQL code in PL/SQL procedures and do not use triggers unless absolutely necessary. Triggers can cause problems such as self-mutating transactions where a table can expect a lock on a row already locked by the same transaction. This is because triggers do not allow transaction termination commands such as COMMIT and ROLLBACK. In short, do not use triggers unless absolutely necessary.
The best approach to tuning of SQL statements is to seek out those statements consuming the greatest amount of resources (CPU, memory and I/O). The more often a SQL statement is executed the more finely it should be tuned. Additionally SQL statements executed many times more often than other SQL statements can cause issues with locking. SQL code using bind variables will execute much faster than those not. Constant re-parsing of similar SQL code can over stress CPU time resources.
Tuning is not necessarily a never-ending process but can be iterative. It is always best to take small steps and then assess improvements. Small changes are always more manageable and more easier to implement. Use the Oracle performance views plus tools such as TKPROF, tracing, Oracle Enterprise Manager, Spotlight, automated scripts and other tuning tools or packages which aid in monitoring and Oracle performance tuning. Detection of bottlenecks and SQL statements causing problem is as important as resolving those issues. In general tuning falls into three categories as listed below, in order of importance and performance impact.
1. Data model tuning.
2. SQL statement tuning.
3. Physical hardware and Oracle database configuration.
 Physical hardware and Oracle database configuration installation will, other than bottleneck resolution, generally only affect performance by between 10% and 20%. Most performance issues occur from poorly developed SQL code, with little attention to SQL tuning during development, probably causing around 80% of general system performance problems. Poor data model design can cause even more serious performance problems than SQL code but it is rare because data models are usually built more carefully than SQL code. It is a common problem that SQL code tuning is often left to DBA personnel. DBA people are often trained as Unix Administrators, SQL tuning is conceptually a programming skill; programming skills of Unix Administrators are generally very low-level, if present at all, and very different in skills requirements to that of SQL code.
How to Tune SQL
Indexing
When building and restructuring of indexing never be afraid of removing unused indexes. The DBA should always be aware of where indexes are used and how.
 Oracle9i can automatically monitor index usage using the ALTER INDEX index name [NO]MONITORING USAGE; command with subsequent selection of the USED column from the V$OBJECT_USAGE column.
Taking an already constructed application makes alterations of any kind much more complex. Pay most attention to indexes most often utilized. Some small static tables may not require indexes at all. Small static lookup type tables can be cached but will probably be force table-scanned by the optimizer anyway; table-scans may be adversely affected by the addition of unused superfluous indexes. Sometimes table-scans are faster than anything else. Consider the use of clustering, hashing, bitmaps and even index organized tables, only in Data Warehouses. Many installations use bitmaps in OLTP databases, this often a big mistake! If you have bitmap indexes in your OLTP database and are having performance problems, get rid of them!
Oracle recommends the profligate use of function-based indexes, assuming of course there will not be too many of them. Do not allow too many programmers to create their indexes, especially not function-based indexes, because you could end-up with thousands of indexes. Application developers tend to be unaware of what other developers are doing and create indexes specific to a particular requirement where indexes may be used in only one place. Some DBA control and approval process must be maintained on the creation of new indexes. Remember, every table change requires a simultaneous update to all indexes created based on that table.
SQL Statement Reorganisation
SQL statement reorganization encompasses factors as listed below, amongst others.
 WHERE clause filtering and joining orders matching indexes.
 Use of hints is not necessarily a good idea. The optimizer is probably smarter than you are.
 Simplistic SQL statements and minimizing on table numbers in joins.
 Use bind variables to minimize on re-parsing. Buying lots of expensive RAM and sizing your shared pool and database buffer cache to very large values may make performance worse. Firstly, buffer cache reads are not as fast as you might think. Secondly, a large SQL parsing shared pool, when not using bind variables in SQL code, will simply fill up and take longer for every subsequent SQL statement to search.
 Oracle9i has adopted various SQL ANSI standards. The ANSI join syntax standard could cause SQL code performance problems. The most effective tuning approach to tuning Oracle SQL code is to remove rows from joins using where clause filtering prior to joining multiple tables, obviously the larger tables, requiring the fewest rows should be filtered first. ANSI join syntax applies joins prior to where clause filtering; this could cause major performance problems.
Nested subquery SQL statements can be effective under certain circumstances. However, nesting of SQL statements increases the level of coding complexity and if sometimes looping cursors can be utilized in PL/SQL procedures, assuming the required SQL is not completely ad-hoc.
 Avoid ad-hoc SQL if possible. Any functionality, not necessarily business logic, is always better provided at the application level. Business logic, in the form of referential integrity, is usually best catered for in Oracle using primary and foreign key constraints and explicitly created indexes.
Nested subquery SQL statements can become over complicated and impossible for even the most brilliant coder to tune to peak efficiency. The reason for this complexity could lie in an over-Normalized underlying data model. In general use of subqueries is a very effective approach to SQL code performance tuning. However, the need to utilize intensive, multi-layered subquery SQL code is often a symptom of a poor data model due to requirements for highly complex SQL statement joins.
Some Oracle Tricks
Use [NOT] EXISTS Instead of [NOT] IN
In the example below the second SQL statement utilizes an index in the subquery because of the use of EXISTS in the second query as opposed to IN. IN will build a set first and EXISTS will not. IN will not utilize indexes whereas EXISTS will.
SELECT course_code, name FROM student
WHERE course_code NOT IN
(SELECT course_code FROM maths_dept);
SELECT course_code, name FROM student
WHERE NOT EXISTS
(SELECT course_code FROM maths_dept
WHERE maths_dept.course_code = student.course_code);
In the example below the nesting of the two queries could be reversed depending on which table has more rows. Also if the index is not used or not available, reversal of the subquery is required if tableB has significantly more rows than tableA.
DELETE FROM tableA WHERE NOT EXISTS
(SELECT columnB FROM tableB WHERE tableB.columnB = tableA.columnA);
Use of value lists with the IN clause could indicate a missing entity. Also that missing entity is probably static in nature and can potentially be cached, although caching causing increased data buffer size requirements is not necessarily a sensible solution.
SELECT country FROM countries WHERE continent IN ('africa','europe','north america');
Equijoins and Column Value Transformations
AND and = predicates are the most efficient. Avoid transforming of column values in any form, anywhere in a SQL statement, for instance as shown below.
SELECT * FROM WHERE TO_NUMBER(BOX_NUMBER) = 94066;
And the example below is really bad! Typically indexes should not be placed on descriptive fields such as names. A function-based index would be perfect in this case but would probably be unnecessary if the data model and the data values were better organized.
SELECT * FROM table1, table2
WHERE UPPER(SUBSTR(table1.name,1,1)) = UPPER(SUBSTR(table2.name,1,1));
Transforming literal values is not such a problem but application of a function to a column within a table in a where clause will use a table-scan regardless of the presence of indexes. When using a function-based index the index is the result of the function which the optimizer will recognize and utilize for subsequent SQL statements.
Datatypes
Try not to use mixed datatypes by setting columns to appropriate types in the first place. If mixing of datatypes is essential do not assume implicit type conversion because it will not always work, and implicit type conversions can cause indexes not to be used. Function-based indexes can be used to get around type conversion problems but this is not the most appropriate use of function-based indexes. If types must be mixed, try to place type conversion onto explicit values and not columns. For instance, as shown below.
WHERE zip = TO_NUMBER('94066') as opposed to WHERE TO_CHAR(zip) = '94066'
The DECODE Function
The DECODE function will ignore indexes completely. DECODE is very useful in certain circumstances where nested looping cursors can become extremely complex. DECODE is intended for specific requirements and is not intended to be used prolifically, especially not with respect to type conversions. Most SQL statements containing DECODE function usage can be altered to use explicit literal selection criteria perhaps using separate SELECT statements combined with UNION clauses. Also Oracle9i contains a CASE statement which is much more versatile than DECODE and may be much more efficient.
Join Orders
Always use indexes where possible, this applies to all tables accessed in a join, both those in the driving and nested subqueries. Use indexes between parent and child nested subqueries in order to utilize indexes across a join. A common error is that of accessing a single row from the driving table using an index and then to access all rows from a nested subquery table where an index can be used in the nested subquery table based on the row retrieved by the driving table.
Put where clause filtering before joins, especially for large tables where only a few rows are required.
Try to use indexes fetching the minimum number of rows. The order in which tables are accessed in a query is very important. Generally a SQL statement is parsed from top-to-bottom and from left-to-right. The further into the join or SQL statement, then the fewer rows should be accessed. Even consider constructing a SQL statement based on the largest table being the driving table even if that largest table is not the logical driver of the SQL statement. When a join is executed, each join will overlay the result of the previous part of the join, effectively each section (based on each table) is executed sequentially. In the example below table1 has the most rows and table3 has the fewest rows.
SELECT * FROM table1, table2, table3
WHERE table1.index = table2.index AND table1.index = table3.index;
Hints
Use them? Perhaps. When circumstances force their use. Generally the optimizer will succeed where you will not. Hints allow, amongst many other things, forcing of index usage rather than full table-scans. The optimizer will generally find full scans faster with small tables and index usage faster with large tables. Therefore if row numbers and the ratios of row between tables are known then using hints will probably make performance worse. One specific situation where hints could help are generic applications where rows in specific tables can change drastically depending on the installation. However, once again the optimizer may still be more capable than any programmer or DBA.
Use INSERT, UPDATE and DELETE ... RETURNING
When values are produced and contained in insert or update statements, such as new sequence numbers or expression results, and those values are required in the same transaction by following SQL statements, the values can be returned into variables and used later without recalculation of expression being required. This tactic would be used in PL/SQL and anonymous procedures. Examples are shown below.
INSERT INTO table1 VALUES (test_id.nextval, 'Jim Smith', '100.12', 5*10)
RETURNING col1, col4 * 2 INTO val1, val2;
UPDATE table1 SET name = 'Joe Soap'
WHERE col1 = :val1 AND col2 = :val2;
DELETE FROM table1 RETURN value INTO :array;
Triggers
Simplification or complete disabling and removal of triggers is advisable. Triggers are very slow and can cause many problems, both performance related and can even cause serious locking and database errors. Triggers were originally intended for messaging and are not intended for use as rules triggered as a result of a particular event. Other databases have full-fledged rule systems aiding in the construction of Rule-Based Expert systems. Oracle triggers are more like database events than event triggered rules causing other potentially recursive events to occur. Never use triggers to validate referential integrity. Try not to use triggers at all. If you do use triggers and have performance problems, their removal and recoding into stored procedures, constraints or application code could solve a lot of your problems.
Data Model Restructuring
Data restructuring involves partitioning, normalization and even denormalization. Oracle recommends avoiding the use of primary and foreign keys for validation of referential integrity, and suggests validating referential integrity in application code. Application code is more prone to error since it changes much faster. Avoiding constraint-based referential is not necessarily the best solution.
Referential integrity can be centrally controlled and altered in a single place in the database. Placing referential integrity in application code is less efficient due to increased network traffic and requires more code to be maintained in potentially many applications. All foreign keys must have indexes explicitly created and these indexes will often be used in general application SQL calls, other than just for validation of referential integrity. Oracle does not create internal indexes when creating foreign reference keys. In fact Oracle recommends that indexes should be explicitly created for all primary, foreign and unique hey constraints. Foreign keys not indexed using the CREATE INDEX statement can cause table locks on the table containing the foreign key. It is highly likely that foreign keys will often be used by the optimizer in SQL statement filtering where clauses, if the data model and the application are consistent with each other structurally, which should be the case.
Views
Do not use views as a basis for SQL statements taking a portion of the rows defined by that view. Views were originally intended for security and access privileges. No matter what where clause is applied to a view the entire view will always be executed first. On the same basis also avoid things such as SELECT * FROM …, GROUP BY clauses and aggregations such as DISTINCT. DISTINCT will always select all rows first. Do not create new entities using joined views, it is better to create those intersection view joins as entities themselves; this applies particularly in the case of many-to-many relationships. Also Data Warehouses can benefit from materialized views which are views actually containing data, refreshed by the operator at a chosen juncture.
Maintenance of Current Statistics and Cost Based Optimization
Maintain current statistics as often as possible, this can be automated. Cost-based optimization, using statistics, is much more efficient than rule-based optimization.
Regeneration and Coalescing of Indexes
Indexes subjected to constant DML update activity can become skewed and thus become less efficient over a period of time. Use of Oracle Btree indexes implies that when a value is searched for within the tree, a series of comparisons are made in order to depth-first traverse down through the tree until the appropriate value is found.
Oracle Btree indexes are usually only three levels, requiring three hits on the index to find a resulting ROWID pointer to a table row. Index searches, even into very large tables, especially unique index hits, not index range scans, can be incredibly fast.
In some circumstances constant updating of a binary tree can cause the tree to become more heavily loaded in some parts or skewed. Thus some parts of the tree require more intensive searching which can be largely fruitless. Indexes should sometimes be rebuilt where the binary tree is regenerated from scratch, this can be done online in Oracle9i, as shown below.
ALTER INDEX index name REBUILD ONLINE;
Coalescing of indexes is a more physical form of maintenance where physical space chunks which are fragmented. Index fragmentation is usually a result of massive deletions from table, at once or over time. Oracle Btree indexes do not reclaim physical space as a result of row deletions. This can cause serious performance problems as a result of fruitless searches and very large index files where much of the index space is irrelevant. The command shown below will do a lot less than rebuilding but it can help. If PCTINCREASE is not set to zero for the index then extents could vary in size greatly and not be reusable. In that the only option is rebuilding.
ALTER INDEX index name REBUILD COALESCE NOLOGGING;

No comments: