Indexes in SQL Server are database objects that help speed up the retrieval of data. They are essential for improving the performance of queries, especially when working with large datasets. There are two types of indexes: clustered and non-clustered. Let's explore both types of indexes in detail and provide real-world examples to clarify their differences.
Types of Index
A clustered index determines the physical order of data in the table. In simple terms, the rows in the table are stored on the disk in the same order as the clustered index. There can be only one clustered index on a table because the data rows themselves can be arranged in only one physical order.
How it works
Primary Key - Typically, the primary key of a table is defined as a clustered index.
Example
In a table of employees, a EmployeeID could be the primary key and thus automatically create a clustered index on EmployeeID. This speeds up queries like SELECT * FROM Employees WHERE EmployeeID = 101.
Real-World Example
Imagine you have a library database. In this database, each book has an ISBN number (International Standard Book Number), and you frequently search books using their ISBN. If you create a clustered index on the ISBN column, the books in the database will be physically ordered by ISBN.
In this case, the physical arrangement of the books (table rows) follows the order of the ISBN numbers.
A non-clustered index, on the other hand, does not alter the physical order of the data rows. Instead, the index is a separate structure that contains a list of references to the data rows, sorted by the indexed column. It works by creating a pointer to the actual data.
AdvantagesHow it works
Real-World Example
Consider the same library database, but this time we want to create an index on the Author's Name column, which will speed up searches for books by a particular author. Here, the Author’s Name would be a non-clustered index.
In this case, the books are not stored by author, but the index helps the database quickly find the rows for Author X or Author Y.
Consider the following Employees table and insert 1000000 records. To create table and insert record use below script.
Create Table Index_Employee
(
Id int primary key identity,
[Name] nvarchar(50),
Email nvarchar(50),
Department nvarchar(50)
)
Go
SET NOCOUNT ON
Declare @counter int = 1
While(@counter <= 1000000)
Begin
Declare @Name nvarchar(50) = 'Rohatash ' + RTRIM(@counter)
Declare @Email nvarchar(50) = 'abc' + RTRIM(@counter) + '@tutorialstrend.com'
Declare @Dept nvarchar(10) = 'Dept ' + RTRIM(@counter)
Insert into Index_Employee values (@Name, @Email, @Dept)
Set @counter = @counter +1
If(@Counter%100000 = 0)
Print RTRIM(@Counter) + ' rows inserted'
End
After Query execution, Table looks like that:
How SQL Server finds a row by ID
For example, let's say we want to find Employee row with EmployeeId = 1120
Notice in just 3 operations, SQL Server is able to find the data we are looking for.
In SQL Server Management Studio click on Include Actual Execution Plan icon
and then execute the following query
Select * from Index_Employee where Id = 932000
The actual execution plan is below.
Notice, the operation is Clustered Index Seek, meaning the database engine is using the clustered index on the employee Id column to find the employee row with Id = 932000
Number of rows read, is the number of rows SQL server has to read to produce the query result. In our case Employee Id is unique, so we expect 1 row and that is represented by Actual number of rows for all executions.
With the help of the index, SQL server is able to directly read that 1 specific employee row we want. Hence, both, Number of rows read and Actual number of rows for all executions is 1.
So the point is, if there are thousands or even millions of records, SQL server can easily and quickly find the data we are looking for, provided there is an index that can help the query find data.
Search Without Index Column - Name
In this example, there is a clustered index on EmployeeId column, so when we
search by employee id, SQL Server can easily and quickly find the data we are
looking for. What if we serach by Employee name? At the moment, there is no
index on the Name column, so there is no easy way for sql server to find the
data we are looking for. SQL server has to read every record in the table which
is extremely inefficient from performace standpoint.
Execute the following query with Include Actual Execution Plan turned ON
Select * from Index_Employee where name = 'Rohatash 932000'
The actual execution plan is below.
Notice, the operation is Clustered Index Scan. Since there is no proper index to help this query, the database engine has no other choice than to read every record in the table. This is exactly the reason why Number of rows read is 1 million, i.e every row in the table
How many rows are we expecting in the result? Well, only one row because there is only one employee whose Name = 'ABC 932000'. So, to produce this 1 row as the result, SQL server has to read all the 1 million rows from the table because there is no index to help this query. This is called Index Scan and in general, Index Scans are bad for performance.
This is when we create a non-clustered index on the Name column.
We’ll use the same table as above to show how the non-clustered index works.
The table already has the clustered index on the column id. Now, we add the non-clustered index on the column Name.
CREATE NONCLUSTERED INDEX Employees_Name
ON [dbo].Index_Employee ([Name])
Execute the following query again with Include Actual Execution Plan turned ON.
Select * from Index_Employee where name = 'Rohatash 932000'
The following is the execution plan.
When we execute this query (Select * from Employee where Name='David')
SQL Server uses the non-clustered index on the Name column to quickly find this employee entry in the Index. In the non-clustered index along with the employee Name, we also have the cluster key (in our case employee ID). The database engine knows there is clustered index on employee id, so this clustered index is then used to find the respective employee record.
Notice the Estimated Subtree Cost with and without non-clustered index on the Name column.