Unlock the power of efficient data retrieval with database indexing. A crucial yet often overlooked aspect of database management, indexing can dramatically improve your system's performance. Let's delve into its intricacies and understand how it can revolutionize your data handling.
Bitmap index
A special kind of index that stores the bulk of its data as bit arrays (bitmaps) and answers most queries by performing bitwise logical operations on these bitmaps.
- Designed for cases where the values of a variable repeat very frequently (e.g., sex field in a customer database).
Non-Clustered
The data is present in arbitrary order, but the logical ordering is specified by the index. The data rows may be spread throughout the table regardless of the value of the indexed column or expression.
- In a non-clustered index, The physical order of the rows is not the same as the index order.
Covering index
This index is only used to locate data records in the table and not to return data.
- It can dramatically speed up data retrieval but may itself be large due to the additional keys, which slow down data insertion and update. To reduce such index size, some systems allow including non-key fields in the index.
Index concurrency control
A.k.a. index locking
Clustered
alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order
- Only one clustered index can be created on a given database table
- The ordering of the physical data rows in accordance with the index blocks that point to them
Column order
The order that the index definition defines the columns in is important.
- It is possible to retrieve a set of row identifiers using only the first indexed column.
- However, it is not possible or efficient (on most databases) to retrieve the set of rows identifiers using the second or greater indexed column.
Data structure for query optimization in databases
Index: a copy of selected columns of data, from a table, that is designed to enable very efficient search
- Key: direct link to the original row of data from which it was copied to allow the complete row to be retrieved efficiently
- Partial indices: index entries are created only for those records that satisfy some conditional expression or user-defined functions
Standardization
No standard defines how to create indexes, because the ISO SQL Standard does not cover physical aspects.
Policing the database constraints
Indexes are used to police database constraints, such as UNIQUE, EXCLUSION, PRIMARY KEY, AND FOREIGN KEY
- An index may be declared as Unique, which creates an implicit constraint on the underlying table.
- Database systems usually implicitly create an index on a set of columns declared Primary KEY, and some are capable of using an already-existing index to police this constraint.
- Many database systems require that both referencing and referenced sets of columns in a Foreign KEY constraint are indexed.
Dense index
In databases, a dense index is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a record. In clustered indices with duplicate keys, the dense index points to the first record with that key.
Sparse index
In databases, a sparse index is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to a particular key in the sorted data file, and points to the lowest-bladed search key in each block.
Applications and limitations
Indexes are useful for many applications but come with some limitations.
- With a wildcard at the beginning of the search-term, the database software is unable to use the underlying index data structure (in other words, the WHERE-clause is not sargable).
- The solution to this problem can be solved through the addition of another index created on reverse(email_address) and a SQL query.
Cluster
When multiple databases and multiple tables are joined, it is called a cluster. The records for the tables sharing the value of a cluster key shall be stored together in the same or nearby data blocks.
- The cluster configuration defines the data layout in the tables that are parts of the cluster.
Secondary index
Used to index fields that are neither ordering fields nor key fields (there is no assurance that the file is organized on key field or primary key field)
- One index entry for every tuple in the data file (dense index) contains the value of the indexed attribute and pointer to the block or record
Index implementations
Indices can be implemented using a variety of data structures. Popular indices include balanced trees, B+ trees and hashes.
- In Microsoft SQL Server, the leaf node of the clustered index corresponds to the actual data, not simply a pointer to data that resides elsewhere.
Support for fast lookup
Since databases may contain many objects, and since lookup is a common operation, it is often desirable to improve performance.
- Many index designs exhibit logarithmic (O(log(N))) lookup performance and in some applications it is possible to achieve flat (O)(1)) performance.