INDEX
Clustered Index Structure:
* It is another database objects which can be used
Non Clustered Index Structure
* It is another database objects which can be used
* To reduce searching process
* To enforce uniqueness
* By
default SS search for the rows by following the process called table scan.
* If
the table consists of huge data then table scan provides less performance.
* Index
is created in tree-like structure which consists of root,node and leaf level.
* At
leaf level, index pages are present by default.
* We
can place max 250 indexes per table.
*
Indexes are automatically placed if we place
* Primary key (clustered)
* Unique (unique
non clustered index)
* We
can place indexes as follows
create
[unique][clustered/nonclustered] index <indexName> on
<tname>/<viewName>(col1,col2,....)
[include(.....)]
Various Types
of indexes available in SQL Server
- Clustered
- NonClustered
- Unique
- Columnstore
- Index with included columns
- Index on computed columns
- Filtered
- Spatial
- XML
- Full-text
- Covering Index
Cluster index:
A clustered index sorts and stores the data rows of the table or view in
order based on the clustered index key.
The clustered index is implemented as a B-tree index structure that
supports fast retrieval of the rows, based on their clustered index key
values.
EXP:
CREATE CLUSTERED INDEX IX_TestTable_TestCol1
ON dbo.TestTable (TestCol1);
Non-Cluster index:
A NonClustered index can be defined on a table or view with a clustered
index or on a heap. Each index row in the NonClustered index contains the NonClustered key value and a row
locator. This locator points to the data row in the clustered index or heap
having the key value. The rows in the index are stored in the order of the
index key values, but the data rows are not guaranteed to be in any particular
order unless a clustered index is created on the table.
EXP:
CREATE NONCLUSTERED INDEX IX_ProductVendor_VendorID
ON
Purchasing.ProductVendor (BusinessEntityID);
Unique Index:
A unique index ensures that the index key contains no duplicate
values and therefore every row in the table or view is in some way unique.
Uniqueness can be a property of both clustered and NonClustered indexes.
EXP:
CREATE UNIQUE INDEX uidx_pid
ON Persons (PersonID);
Columnstore index:
A columnstore index is a technology for
storing, retrieving, and managing data by using a columnar data format, called
a columnstore.
A columnstore is data that's logically
organized as a table with rows and columns, and physically stored in a
column-wise data format.
A columnstore index can provide a very high
level of data compression, typically by 10 times, to
significantly reduce your data warehouse storage cost.
Reasons why columnstore
indexes are so fast:
·
Columns store values from the same domain and commonly have
similar values, which result in high compression rates. I/O bottlenecks in your
system are minimized or eliminated, and memory footprint is reduced
significantly.
·
High compression rates improve query performance by using a
smaller in-memory footprint. In turn, query performance can improve because SQL
Server can perform more query and data operations in memory.
·
Batch execution improves query performance, typically by two to
four times, by processing multiple rows together.
·
Queries often select only a few columns from a table, which
reduces total I/O from the physical media.
EXP:
CREATE CLUSTERED COLUMNSTORE INDEX cci_Simple ON
SimpleTable;
GO
Index with Included columns:
A NonClustered
index that is extended to include nonkey columns in addition to the key
columns.
Exp:
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City,
StateProvinceID);
GO
Filtered index:
is a new feature in SQL SERVER. Filtered Index is used to
index a portion of rows in a table that means it applies a filter on
INDEX which improves query performance, reduces index maintenance costs,
and reduce index storage costs compared with full-table indexes.
When we see an Index created with some where clause then that is
a FILTERED.
Exp:
CREATE NONCLUSTERED INDEX FIBillOfMaterialsWithEndDate
ON Production.BillOfMaterials (ComponentID, StartDate)
WHERE EndDate IS NOT NULL ;
GO
Spatial index:
A spatial
index is a type of extended index that allows you to index a
spatial column. A spatial column is a table column that contains data of a spatial
data type, such as geometry or geography.
EXP:
If we want to get the Employees whose Title is “Marketing
Manager”, for that let’s create an INDEX on EmployeeID whose Title is
“Marketing Manager” and then write the SQL Statement to retrieve Employees who
are “Marketing Manager”.
CREATE NONCLUSTERED INDEX NCI_Department
ON HumanResources.Employee(EmployeeID)
WHERE Title= 'Marketing Manager'
XML index:
XML indexes can be created on xml data type
columns. They index all tags, values and paths over the XML instances in the
column and benefit query performance
EXP:
CREATE PRIMARY XML INDEX PXML_ProductModel_CatalogDescription
ON
Production.ProductModel (CatalogDescription);
Full text index:
A special type of token-based functional index that is built and maintained by the Microsoft Full-Text
Engine for SQL Server. It provides efficient support for sophisticated word
searches in character string data.
EXP:
CREATE UNIQUE INDEX ui_ukJobCand ON HumanResources.JobCandidate(JobCandidateID);
CREATE FULLTEXT CATALOG ft AS DEFAULT;
CREATE FULLTEXT INDEX ON HumanResources.JobCandidate(Resume)
KEY INDEX ui_ukJobCand
WITH
STOPLIST = SYSTEM;
GO
Covering Index:
A covering
index is one which can satisfy all requested columns in a query without
performing a further lookup into the clustered index.------------------------------------------------------------------------------------------------------------------------------------------.
* It physically sorts the rows in the table.
* A
table can have only ONE clustered index.
* Both
data and index pages are merged and stored at third level (Leaf level).
* We
can place on columns which are used to search a range of rows,
create
clustered index pid_indx on prods(pid)
select
* from prods
select
* from prods where pid=2
insert
prods values(3,'Pencils',500)
Note: A table without clustered index is called HEAP
where the rows and pages of the table are
not present in any order.
* It cannot sort the rows physically.
* We
can place max 249 nonclustered indexes on table.
* Both
data and index pages are stored seperately.
* It
locates rows either from heap (Table scan) or from clustered index.
*
Always we have to place first clustered index then nonclustered.
* If
the table is heap the index page consists of
IndexKeyColvalues rowreference
* If
the table consists of clustered index then index page consists of
IndexKeyColValues Clusteredindexkeycolvalues
*
Nonclustered indexes are rebuilded when
* Clustered index is
created/droped/modified
Ex:
Create nonclustered index on pname column of prods table.
create index indx1 on prods(pname)
select * from prods where
pname='Books' -- check execution plan
* To
disp indexes present on a table
sp_helpindex <tname>
* To
drop index
drop index prods.pid_indx
* To
disp space used by the index
sp_spaceused prods
Using Included Columns in Non-Clustered Index
* We can maintain regularly used columns in non-clustered index so
that no need that the SQL server should take data from heap or clustered index.
* If
the no of rows are more it provides better performance.
Ex:
--step1
USE
AdventureWorks
GO
CREATE
NONCLUSTERED INDEX IX_Address_PostalCode
ON
Person.Address (PostalCode)
INCLUDE
(AddressLine1, AddressLine2, City, StateProvinceID)
GO
--step2
SELECT
AddressLine1, AddressLine2, City, StateProvinceID, PostalCode
FROM
Person.Address
WHERE PostalCode
BETWEEN '98000'
AND
'99999';
GO
*
Percentage of space used in leaf level index pages.
* By
default it is 100%.
* To
reduce page splits when the data is manipulated in the base table we can set
proper FillFactor.
* It allows
online index processing
* While the index rebuilding process
is going on users can work with the table.
Question: Where do you find the
default Index fill factor and how to change it?
·
The easiest way to find and change the default
fill factor value is from Management Studio, right-click the SQL Server and
choose properties. In the Server Properties, choose Database Settings, you
should see the default fill factor value in the top section. You can change to
a desired value there and click OK to save the changes.
·
The other option of viewing and
changing this value is using sp_configure.
Page Split
-Due to regular changes in the table if the index pages are full to
allocate memory for the index key columns SS takes remaining rows into new
page. This process is called Page split.
-Page split increases size of index and the index pages order
changes.
Fragmentation
Fragmentation
can be defined as any condition that cause more than optimal amount of disk I/O
to be performed in accessing a table or cause the longer disk I/O. Optimal
performance of SELECT queries occurs when the data pages of tables
are contiguous as possible and pages are fully packed as
possible.Fragmentation breaks this rule and reduce the performance of the
queries. Fragmentation can happen in two level. One is file system level
fragmentation which is called as Logical/Physical Disk Fragmentation and Index
level fragmentation. Each of them are described in the below sections.
Logical/Physical Disk
Fragmentation
Logical
fragmentation is the fragmentation of database file in the file system itself
like any other files.This occurs when the file system is not able
allocate contiguous space for database file.As a result, disk head has to
move back and forth to read from the database files. SQL server
is completely unaware about this kind of fragmentation and it is not
possible to measure the logical disk fragmentation using any script. Logical
disk fragmentation can happen due to various reason like
·
Placing database file in the same disk where
other files (like OS files and other application files) are kept.
Frequent
growth of the database file in smaller chunks.
To remove
logical fragmentation we can use the windows fragmentation tool but note that
we need to stop the SQL server while running the de-fragmentation
tools.Otherwise the de-fragmentation tool will skip database file as it is
used by the SQL server.
The best ways to to avoid the logical
fragmentation are:
·
Keep the database files in
a separate disk isolated from other application files and log files.
·
While creating new database, estimate the size of
database file and allocate enough space to avoid the frequent growth of data
files.
·
Specify the database growth option to allocate
larger chunks rather than small chunks frequently.
Index Level Fragmentation
Index level
fragmentation comes in two flavors: Internal Fragmentation and External
Fragmentation. If Index level fragmentation is high, it may prevents optimizer
from using the available indexes in optimal way.
Internal Fragmentation
Internal
fragmentation is measured in average page fullness of the index
(Page density). A page that is 100% full has no internal fragmentation.In
other words, internal fragmentation occur when there is empty space in the
index page and this can happen due to insert/update/delete DML operation.Every
index page can hold a certain number of records based on the size of the index,
but that does not guaranteed that the page always hold maximum number
records. Internal fragmentation is normally reported as a percentage of
fullness in bytes, not in records. An index page that has internal
fragmentation 90% may be full in terms of record. The remaining 10% bytes of
the pages may not be enough to hold one more record. In a 8KB pages,
maximum of 8060 bytes can be used by data.Rest of space are used by
page header and row offset array.Let us assume that we have index with fixed
size of 100 bytes and the index has 800 entries. So we can can store 8060/100=
80 records per page by leaving 60 bytes empty as it is not enough to hold one
more records and this index requires 10 pages to store the entire index
structure.If you calculate the average fullness of this index, in ideal
scenario it will come as 99.26%.
Let us assume
that we are deleting the half of the entries randomly of this table which
reduce the total number of entries in this index to 400.Now the pages will look
like as given in Fig 2 with total of 40600 bytes free space across 10
pages. If you calculate the the average fullness as Total data size*100/Total
page size = 4000*100/80600= 49.62% . It clearly says that, half of the spaces
are empty and the index has internal fragmentation.
How Internal Fragmentation will affect the performance of the
SQL server?
1. Internal Fragmentation will increase the I/O. When you run
queries that scan part or complete table/index, if you have internal
fragmentation on that table/index, it causes additional page reads. In our
example, the entire data can be stored in 5 pages. When the query needs to do
index scan it has to read 10 pages instead of 5 pages. Which means 50% more
I/O.
2. Internal Fragmentation reduce the efficiency of buffer cache.When indexes
has internal fragmentation, it need more space to fit in the buffer.In our case
this single index will use 5 additional pages to fit into the buffer which
should have used to store other index pages. This will reduce the cache hit
ratio. In turn it will increase the physical I/O. It also increase
the logical reads.
3. This also increase the size of the database file. It need more disk
space to store the additional pages and reduce the performance of Backup and
Restore.
External
Fragmentation
External Fragmentation
happens when the logical order of the pages does not match the physical order
of the pages. External fragmentation refers to the lack of
correlation between the logical sequence of an index and its physical sequence.
It is measured as the percentage of out-of-order pages in the leaf pages of an
index. An out-of-order page is a page for which the next physical page
allocated to the index is not the page pointed to by the next-page pointer in
the current leaf page.Let us see the Fig 3 below. It is representation of index
with three pages.Data is stored in sequential page. In other terms logical
order and physical order are same and it store the index keys from 1 to 16 a.
All pages
are completely full except the Page 3
While inserting the value 4
into the table it has to place in the Page 1 between value 3 and 5
but unfortunately Page 1 does not have any free space to occupy
one more record. The only option is perform a page split by dividing
the Page 1 evenly by leaving half of the data in Page 1 and
moving half of the data to new page (Page 4). From Fig 4 we can understand that
the logical order of the Page 4 is not matching with the physical
order. External Fragmentation can happen due to various reasons:
1. While allocating pages for new table, SQL server allocate pages from
mixed extend till it reaches the 8 pages. There is possibility of having the
first 8 pages from 8 different extents.
2. When all records are deleted from a page, the page will be
de-allocated from the index (The de-allocation of pages will not
happen immediately) which create gap and increase the fragmentation.
3. Once object reached 8 pages size, SQL server will start allocating
uniform extent to the objects.While allocation uniform extent to an index, next
sequential extent to the current extent might be already allocated to
other objects/indexes.
How External
Fragmentation will affect the performance of the SQL server?
While reading individual rows, external fragmentation will not affect the performance as it directly go to the page and fetch the data.Unordered scans also will not affected by the external fragmentation as it use the IAM pages to find which extents need to be fetched. In the case of ordered index scan, external fragmentation might become a degrading factor for performance. The degradation of the performance is because the disk drive's heads have to jump around on the physical disk, rather than performing just contiguous read operations.Also note that external fragmentation will not affect the performance once the pages are loaded into the buffer pool.
avg_fragmentation_in_percent
can
have higher value due to various reasons:
· SQL server storage engine allocates pages from mixed extent to a
table or index till the page count reaches eight.Once the page count reaches to
eight SQL server storage engine starts assigning full uniform extents to the
index. So there is a possibility of having higher fragmentation for small table
and rebuilding indexes might increase the fragmentation.For example, let us
assume that an index has 7 pages and these pages are allocated from two mixed
extent, while rebuilding the index there is possibility of allocating pages
from more than 2 extents and maximum of seven extents which in turn increase
the fragmentation.
· Even the pages are allocated from uniform extent, there is
possibility of fragmentation. When the size of index grow, it need more pages
in the non leaf level also.If last page allocated to leaf level is 250 and
to accommodate more row in the leaf lever index structure might need a
page in index level 1, then SQL server storage engine allocate page 251 to the
index level 1 which create fragment in the leaf level.
· Other common reason is the page split due to the DML operations.
This I have explained well in my previous post.Rebuild/Reorganize index
may not be effective to fix fragmentation happened due to the fist two reason,
but it can reduce the fragmentation caused by the page split or delete
operation.
In
our environment we follow the index maintenance as given below:
§ 20 to
40 percentage of fragmentation is handled with reorganizing the index.
§ All
index which has more 40 percentage fragmentation will considered for rebuild
§ Index
which has less than 1000 pages will be ignored by the index maintenance logic.
§ Index
which has more than 50K pages and fragmentation between 10 and 20 will also be
considered for Reorganize.
Detecting
fragmentation, which are as follows:
· avg_fragmentation_in_percent: This
is a percentage value that represents external fragmentation. For a clustered
table and leaf level of index pages, this is Logical fragmentation, while for
heap, this is Extent fragmentation. The lower this value, the better it is. If
this value is higher than 10%, some corrective action should be taken.
· avg_page_space_used_in_percent: This
is an average percentage use of pages that represents to internal
fragmentation. Higher the value, the better it is. If this value is lower than
75%, some corrective action should be taken.
Reducing
fragmentation:
· Reducing
Fragmentation in a Heap: To reduce the fragmentation of a
heap, create a clustered index on the table. Creating the clustered index,
rearrange the records in an order, and then place the pages contiguously on
disk.
·
Reducing
Fragmentation in an Index: There are three choices for
reducing fragmentation, and we can choose one according to the percentage of
fragmentation:
·
If avg_fragmentation_in_percent > 5% and < 30%, then
use ALTER INDEXREORGANIZE: This statement is replacement for DBCC INDEXDEFRAG
to reorder the leaf level pages of the index in a logical order. As this is an
online operation, the index is available while the statement is running.
· If avg_fragmentation_in_percent
> 30%, then use ALTER INDEX REBUILD: This is replacement
for DBCC DBREINDEX to rebuild the index online or offline. In such case, we can
also use the drop and re-create index method.
·
(Update:
Please note this option is strongly NOT recommended)Drop and re-create the
clustered index: Re-creating a clustered index redistributes the data and
results in full data pages. The level of fullness can be configured by using
the FILLFACTOR option in CREATE INDEX.
To find fragmentation level we can use
dbcc showcontig
or
We can use sys.dm_db_index_physical_stats
DMF as follows
SELECT a.index_id, name,
avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats
(DB_ID('AdventureWorks'),
OBJECT_ID('Production.Product'), NULL,
NULL, NULL)
AS a JOIN sys.indexes AS b
ON a.object_id = b.object_id
AND a.index_id =b.index_id;
* To control fragmentation we can either
reorganize the index or rebuild the index.
1. Reorganizing Index
* It is the process of arranging the index pages according to the
order of index key column values.
* If the fragmentation level is more than 5 to 8% and less than 28to
30% then we can reorganize the indexes.
* It cannot reduce the index size as well as statistics are not
updated.
syn:
ALTER INDEX
<indexName>/<All> on <tname> REORGANIZE
* It is the process of deleting and
creating fresh index.
* It reduces the size of index and
updates statistics
* If the fragmentation level is more
than 30% then we can rebuild indexes.
syn:
ALTER INDEX
<indexName>/<ALL> on <tname> REBUILD
Note:
If we have mentioned ONLINE INDEX PROCESSING option then rebuilding
takes space in TEMPDB.
To check consistency of a database we can use DBCC CHECKDB('dbName')
it disp if any corrupted pages are present, usage space in tempdb.
Option
|
DBCC
DBREINDEX (SQL 2000)
ALTER INDEX REBUILD
(SQL
2005)
|
DBCC
INDEXDEFRAG (SQL 2000)
ALTER INDEX REORGANIZE (SQL 2005) |
Rebuild All Indexes
|
Yes
|
Need to run for each index.
In SQL 2005 using the ALTER INDEX you can specify ALL indexes.
|
Online Operation
|
No, users will be locked out until
complete. In SQL Server 2005 Enterprise Edition you can build indexes online.
|
Yes, users can still use the table
|
Transaction Log Impact
|
Depends on the recovery model of
the database
|
Fully logged operation regardless
of the database recovery model
|
Transaction Log Impact
|
If set to the full recovery model
can consume a lot of space for operation to complete.
|
If index is very fragmented this
could potentially take up more transaction log space.
|
Can run in parallel (uses multiple
threads)
|
Yes
|
No
|
Clustered Index :-
1.
There can be only one Clustered index for a table
2.
Usually made on the primary key
3. The
logical order of the index matches the physical stored
order
of the rows on disk
4. Cluster index is faster than non cluster index because the cluster index has to refer back to the table, if the selected column is not present in index.
6. Cluster index determines the storage order of the rows in the table, and hence does not require additional disk space.
4. Cluster index is faster than non cluster index because the cluster index has to refer back to the table, if the selected column is not present in index.
6. Cluster index determines the storage order of the rows in the table, and hence does not require additional disk space.
Non-Clustered Index
1.
There can be only 249 Non-Clustered index for a table
2.
usually made on the any key
3. the
logical order of the index does not match the physical
4. non cluster index stored separate in the table. additional storage space is required.
4. non cluster index stored separate in the table. additional storage space is required.
stored
order of the rows on disk
No comments:
Post a Comment