SQL-Index


Indexes in SQL Server are database objects that help speed up the retrieval of data. They are essential for improving the performance of queries, especially when working with large datasets. There are two types of indexes: clustered and non-clustered. Let's explore both types of indexes in detail and provide real-world examples to clarify their differences.

Types of Index

  1. Clustered
  2. Non-clustered

1. Clustered Index

A clustered index determines the physical order of data in the table. In simple terms, the rows in the table are stored on the disk in the same order as the clustered index. There can be only one clustered index on a table because the data rows themselves can be arranged in only one physical order.

How it works

  • The data and the index are stored together, meaning the index contains the actual data.
  • When a query is made on the column with the clustered index, the database looks directly at the data rows.
  • Because the data is physically ordered, searching for a range of values is faster.

Primary Key - Typically, the primary key of a table is defined as a clustered index.

Example

In a table of employees, a EmployeeID could be the primary key and thus automatically create a clustered index on EmployeeID. This speeds up queries like SELECT * FROM Employees WHERE EmployeeID = 101.

Real-World Example

Imagine you have a library database. In this database, each book has an ISBN number (International Standard Book Number), and you frequently search books using their ISBN. If you create a clustered index on the ISBN column, the books in the database will be physically ordered by ISBN.

  • When you perform a search for a specific ISBN, the database knows exactly where to look because the books are stored in that order.
  • A clustered index on ISBN would allow quick retrieval for queries that search for a specific ISBN, or for a range of ISBNs.
SQL Index

In this case, the physical arrangement of the books (table rows) follows the order of the ISBN numbers.

2. Non-Clustered Index

A non-clustered index, on the other hand, does not alter the physical order of the data rows. Instead, the index is a separate structure that contains a list of references to the data rows, sorted by the indexed column. It works by creating a pointer to the actual data.

Advantages
  • Can create indexes on columns that aren't part of the primary key.
  • Allows multiple non-clustered indexes for different queries.
  • Speeds up lookups on non-primary columns.

How it works

  • The index is stored separately from the data. It holds pointers to the actual rows in the table.
  • Multiple non-clustered indexes can exist on a table, and they can be used for different columns.
  • The non-clustered index is quicker for searches that use columns other than the primary key or clustered index column.

Real-World Example

Consider the same library database, but this time we want to create an index on the Author's Name column, which will speed up searches for books by a particular author. Here, the Author’s Name would be a non-clustered index.

  • The database doesn’t rearrange the books physically by author, but instead creates a separate structure that has author names and pointers to the actual rows.
  • If you search for books by a specific author, the database uses the non-clustered index to quickly find the rows without scanning the entire table.
SQL Index

In this case, the books are not stored by author, but the index helps the database quickly find the rows for Author X or Author Y.

How do SQL Indexes Work

Consider the following Employees table and insert 1000000 records. To create table and insert record use below script.

Create Table Index_Employee
(
	Id int primary key identity,
	[Name] nvarchar(50),
	Email nvarchar(50),
	Department nvarchar(50)
)
Go

SET NOCOUNT ON
Declare @counter int = 1

While(@counter <= 1000000)
Begin
	Declare @Name nvarchar(50) = 'Rohatash ' + RTRIM(@counter)
	Declare @Email nvarchar(50) = 'abc' + RTRIM(@counter) + '@tutorialstrend.com'
	Declare @Dept nvarchar(10) = 'Dept ' + RTRIM(@counter)

	Insert into Index_Employee values (@Name, @Email, @Dept)

	Set @counter = @counter +1

	If(@Counter%100000 = 0)
		Print RTRIM(@Counter) + ' rows inserted'
End

After Query execution, Table looks like that:

SQL Index

How SQL Server finds a row by ID

For example, let's say we want to find Employee row with EmployeeId = 1120

SQL Index
  1. So the database engine starts at the root node and it picks the index node on the right, because the database engine knows it is this node that contains employee IDs from 801 to 1200.
  2. From there, it picks the leaf node that is present on the extreme right, because employee data rows from 1001 to 1200 are present in this leaf node.
  3. The data rows in the leaf node are sorted by Employee ID, so it's easy for the database engine to find the employee row with Id = 1120.

Notice in just 3 operations, SQL Server is able to find the data we are looking for.

Clustered Index Seek

In SQL Server Management Studio click on Include Actual Execution Plan icon and then execute the following query

Select * from Index_Employee where Id = 932000

The actual execution plan is below.

SQL Index

Notice, the operation is Clustered Index Seek, meaning the database engine is using the clustered index on the employee Id column to find the employee row with Id = 932000

  1. Number of rows read = 1
  2. Actual number of rows for all executions = 1

Number of rows read, is the number of rows SQL server has to read to produce the query result. In our case Employee Id is unique, so we expect 1 row and that is represented by Actual number of rows for all executions.

With the help of the index, SQL server is able to directly read that 1 specific employee row we want. Hence, both, Number of rows read and Actual number of rows for all executions is 1.

So the point is, if there are thousands or even millions of records, SQL server can easily and quickly find the data we are looking for, provided there is an index that can help the query find data.

Search Without Index Column - Name

In this example, there is a clustered index on EmployeeId column, so when we search by employee id, SQL Server can easily and quickly find the data we are looking for. What if we serach by Employee name? At the moment, there is no index on the Name column, so there is no easy way for sql server to find the data we are looking for. SQL server has to read every record in the table which is extremely inefficient from performace standpoint.

Execute the following query with Include Actual Execution Plan turned ON

 Select * from Index_Employee where name = 'Rohatash 932000'

The actual execution plan is below.

SQL Index

Notice, the operation is Clustered Index Scan. Since there is no proper index to help this query, the database engine has no other choice than to read every record in the table. This is exactly the reason why Number of rows read is 1 million, i.e every row in the table

  1. Number of rows read = 1000000
  2. Actual number of rows for all executions = 1

How many rows are we expecting in the result? Well, only one row because there is only one employee whose Name = 'ABC 932000'. So, to produce this 1 row as the result, SQL server has to read all the 1 million rows from the table because there is no index to help this query. This is called Index Scan and in general, Index Scans are bad for performance.

This is when we create a non-clustered index on the Name column.

Non-Clustered Index Seek

We’ll use the same table as above to show how the non-clustered index works.

The table already has the clustered index on the column id. Now, we add the non-clustered index on the column Name.

CREATE NONCLUSTERED INDEX Employees_Name
ON [dbo].Index_Employee ([Name])

Execute the following query again with Include Actual Execution Plan turned ON.

Select * from Index_Employee where name = 'Rohatash 932000'

The following is the execution plan.

SQL Index

Non-clustered and clusterd index in action

When we execute this query (Select * from Employee where Name='David')

SQL Index

SQL Server uses the non-clustered index on the Name column to quickly find this employee entry in the Index. In the non-clustered index along with the employee Name, we also have the cluster key (in our case employee ID). The database engine knows there is clustered index on employee id, so this clustered index is then used to find the respective employee record.

Notice the Estimated Subtree Cost with and without non-clustered index on the Name column.

SQL Index
Next