Advantages and Disadvantages of Index Page 2/2

How the system accesses data in tables

Generally, when a system accesses data in a database, two methods can be used: table scanning and index search. The first method is table scanning, which means that the system places a pointer on the data page where the header data of the table is located, and then scans all the data pages occupied by the table data from front to back in the order of the data pages, page by page until all records in the table are scanned. When scanning, if a record that meets the query conditions is found, then the record is selected. Finally, all the records selected and match the query statement conditions are displayed. The second method is to use index search. An index is a tree structure that stores keywords and pointers to data pages containing the records where the keywords are located. When using index search, the system finds records that meet the query conditions according to the keywords and pointers in the index along the tree structure of the index. Finally, all the found records that meet the query statement conditions are displayed.

In SQL Server, when accessing data in the database, SQL Server determines whether there is an index in the table. If there is no index, SQL Server uses table scanning method to access the data in the database. The query processor generates an optimized execution plan for the query statement based on the statistical information of the distribution, aiming to improve the efficiency of accessing data, and determines whether to use table scanning or indexing.

Index options

When creating an index, you can specify some options. By using these options, the performance of the index can be optimized. These options include the FILLFACTOR option, the PAD_INDEX option, and the SORTED_DATA_REORG option.

Using the FILLFACTOR option, you can optimize the performance of insert statements and modify statements. When an index page becomes full, SQL Server must spend time breaking down the page to make room for new record lines. Using the FILLFACTOR option is to allocate a certain percentage of free space on the leaf-level index page to reduce the decomposition time of the page. When creating an index in a table with data, you can use the FILLFACTOR option to specify the percentage of padding for each leaf-level inode. The default value is 0, which is equivalent to 100. When creating an index, the internal inode always leaves a certain amount of space, which is enough to accommodate records in one or two tables. In a table without data, do not use this option when creating an index, because this option has no practical significance at this time. In addition, the value of this option cannot be maintained dynamically after it is specified at creation time, so it should only be used when an index is created in a table with data.

The PAD_INDEX option also uses the numerical value of the FILLFACTOR option to the internal inode, so that the fill degree of the internal inode node is the same as that of the leaf-level indexed node. If the FILLFACTOR option is not specified, it is meaningless to specify the PAD_INDEX option separately, because the value of the PAD_INDEX option is determined by the value of the FILLFACTOR option.

When creating a clustered index, the SORTED_DATA_REORG option clears sorting, so it can reduce the time required to create a clustered index. When creating or reconstructing a clustered index on a table that has become fragments, the SORTED_DATA_REORG option can be used to compress the data page. This option is also used when the fill degree needs to be applied on the index again. When using the SORTED_DATA_REORG option, these factors should be considered: SQL Server confirms whether each key value is higher than the previous key value. If it is not high, then it cannot create an index; SQL Server requires 1.2 times the table space to physically reorganize data; use the SORTED_DATA_REORG option to speed up the index creation process by clearing the sorting process; physically copy data from the table; when a row is deleted, the space it occupies can be reused; all non-clustered indexes can be created; if you want to fill leaf-level pages to a certain percentage, you can use the FILLFACTOR option and the SORTED_DATA_REORG option at the same time.

Maintenance of indexes

In order to maintain system performance, after the index is created, the index page must be fragmented due to frequent operations such as adding, deleting, modifying data.

Use the DBCC SHOWCONTIG statement to display the data of the table and the fragmented information of the index. When executing the DBCC SHOWCONTIG statement, SQL Server browses the entire index page on the leaf level to determine whether the table or specified index is seriously fragmented. The DBCC SHOWCONTIG statement can also determine whether the data page and index page are full. When a large amount of modifications to the table or a large amount of data is added, or the table query is very slow, the DBCC SHOWCONTIG statement should be executed on these tables. When executing the DBCC SHOWCONTIG statement, these factors should be considered: When executing the DBCC SHOWCONTIG statement, SQL Server requires the table ID number or index ID number, and the table ID number or index ID number can be obtained from the system table sysindexes; it should be determined how long it takes to use the DBCC SHOWCONTIG statement. This time length must be determined based on the activity of the table, which can be done every day, weekly or monthly.

Use the DBCC DBREINDEX statement to reconstruct one or more indexes of a table. When you want to rebuild the index and when there are primary key constraints or unique key constraints on the table, execute the DBCC DBREINDEX statement. In addition, executing the DBCC DBREINDEX statement can also reorganize the storage space of the leaf-level index page, delete fragments, and recalculate index statistics. When executing DBCC DBREINDEX statements, these factors should be considered: the system refills each leaf-level page according to the specified fill degree; the DBCC DBREINDEX statement is used to rebuild the index of primary key constraints or unique key constraints; the SORTED_DATA_REORG option can create clustered indexes faster. If the key values are not arranged, the DBCC DBREINDEX statement cannot be used; the DBCC DBREINDEX statement does not support system tables. In addition, the database maintenance planning wizard can be used to automatically rebuild the index process.

Statistics are samples of column data stored in SQL Server. These data are generally used for indexed columns, but statistics can also be created for non-indexed columns. SQL Server maintains distribution statistics for a certain index key value and uses these statistics to determine which index is useful in the query process. The optimization of the query depends on the distribution accuracy of these statistics. The query optimizer uses these data samples to decide whether to use table scans or indexes. When the data in the table changes, SQL Server periodically automatically modifys statistical information. Index statistics are automatically modified, and the key values in the index change significantly. The frequency of statistical information modification is determined by the amount of data in the index and the amount of data change. For example, if there are 10,000 rows of data in the table and 1,000 rows of data are modified, the statistics may need to be modified. However, if only 50 rows of records are modified, the current statistics are still maintained. In addition to automatic system modification, users can also manually modify statistical information by executing the UPDATE STATISTICS statement or the sp_updatestats system stored procedure. Use the UPDATE STATISTICS statement to modify all indexes in the table or the specified index.

Use SHOWPLAN and STATISTICS IO statements to analyze index and query performance. Use these statements to better adjust queries and indexes. The SHOWPLAN statement displays each step of the query optimizer used in the join table and indicates which index to access the data. Use the SHOWPLAN statement to view the query plan for a specified query. These factors should be considered when using SHOWPLAN statements. The output result returned by the SET SHOWPLAN_ALL statement is more detailed than the output result returned by the SET SHOWPLAN_TEXT statement. However, the application must be able to process the output result returned by the SET SHOWPLAN_ALL statement. The information generated by the SHOWPLAN statement can only be used for one session. If you reconnect to SQL Server, you must re-execute the SHOWPLAN statement. The STATISTICS IO statement indicates the number of inputs and outputs, which are used to return the result set and display the logical and physical I/O information for the specified query. This information can be used to determine whether the query statement should be rewrite or the index should be redesigned. Use the STATISTICS IO statement to view the I/O information used to process the specified query.

Just like the SHOWPLAN statement, optimizer hiding is also used to adjust query performance. Optimizer hiding can provide minor improvements to query performance, and if the indexing strategy changes, such optimizer hiding is useless. Therefore, the limitation of using optimizer hiding is because optimizer hiding is more efficient and flexible. When using the optimizer to hide, consider these rules: specify the index name, use table scan when index_id is 0, use clustered index when index_id is 1; the optimizer hides the overwrite query optimizer, and if the data or environment changes, the optimizer must be modified to hide.

Index Adjustment Wizard

The Index Adjustment Wizard is a tool that can analyze a series of database query statements, provide suggestions for using a series of database indexes, and optimize the performance of the entire query statement. For query statements, the following contents need to be specified:

Query statement, this is the workload to be optimized.
A database containing these tables, in which indexes can be created to improve query performance.
Tables used in analysis
In the analysis, the constraints considered, such as the maximum disk space that the index can use
The workload referred to here can come from two aspects: the trajectory captured by SQL Server and the files containing SQL statements. The index adjustment wizard is always based on a defined workload. If a workload does not reflect normal operations, then the index it recommends to use is not the best performing index on the actual workload. The Index Adjustment Wizard calls the query analyzer, using all possible combinations to evaluate the performance of each query statement in this effort. Then, it is recommended that indexes that can improve the performance of the entire query statement over the entire workload. If there is no workload for the index adjustment wizard to analyze, you can create it immediately using the grapher. Once you decide to track a description sample of normal database activity, the wizard can analyze this effort and recommend index configurations that can improve database performance.

After the Index Adjustment Wizard analyzes the workload, it can view a series of reports, and can also enable the wizard to immediately create the best index suggested, or make the work a job that can be scheduled, or generate a file containing the SQL statements that create these indexes.

The Index Adjustment Wizard allows you to select and create an ideal index combination and statistics for the SQL Server database without requiring expert understanding of the database structure, workload, or SQL Server internals. In short, the index adjustment wizard can do the following:

By using a query optimizer to analyze query tasks in workload, recommending an optimal indexing mixing method to databases with a lot of workload.
Analyze the effects after making changes according to the suggestions, including the usage of indexes, the distribution of inter-table queries and the work effects of queries in a large number of work.
Recommended methods to adjust databases for a small number of query tasks
By setting advanced options such as disk space constraints, maximum number of query statements, and maximum number of columns per index, allowing custom recommendations.
Graphicizer

The grapher can crawl continuous images running in the server in real time, and can select items and events you want to monitor, including Transact-SQL statements and batch commands, usage of objects, locks, security events, and errors. The grapher is able to filter these events and display only issues of concern to the user. You can use the same server or other server to repeat the recorded tracking events and re-execute those recorded commands. By centrally handling these events, it is easy to monitor and debug problems in SQL Server. Through the study of specific events, monitoring and debugging SQL Server problems has become much easier.

Query Processor

A query processor is a multi-purpose tool that can do many tasks. In the query processor, various Transact-SQL statements can be input and executed interactively, and the Transact-SQL statements and their result sets can be viewed simultaneously in a window; multiple Transact-SQL statements can be executed simultaneously in the query processor, or some statements in the script file can be executed; a method of graphically analyzing query statement execution planning can be provided, which can report the data retrieval method selected by the query processor, and the execution of the query statement can be adjusted according to the query plan, and the optimization index suggestions that can improve performance can be proposed. This suggestion is only an index suggestion for a query statement, and can only improve the query performance of this query statement.

The system creates a distribution page for each index, and statistical information refers to the distribution information of the key values of one or more indexes stored in a table on the distribution page. When executing a query statement, to improve query speed and performance, the system can use this distribution information to determine which index of the table is used. The query processor relies on the statistical information of these distributions to generate the execution plan of the query statement. The degree of optimization of the execution plan depends on the level of the accurate steps of these distribution statistics. If the statistical information of these distributions is very consistent with the physical information of the index, the query processor can generate a highly optimized execution plan. On the contrary, if these statistical information differs significantly from the actual stored information of the index, the degree of optimization of the execution plan generated by the query processor is relatively low.

The query processor extracts the distribution information of index keywords from the statistical information. In addition to the user who can manually execute UPDATE STATISTICS, the query processor can also automatically collect and count these distribution information. In this way, it is possible to fully ensure that the query processor uses the latest statistical information, ensure that the execution plan has a high degree of optimization, and reduce the maintenance needs. Of course, there are some limitations in the execution plan generated using the query processor. For example, using execution planning can only improve the performance of a single query statement, but may have a positive or partial impact on the performance of the entire system. Therefore, to improve the query performance of the entire system, a tool such as the Index Adjustment Wizard should be used.

in conclusion

In previous SQL Server versions, at most one index was used on a table in a query statement. In SQL Server 7.0, the indexing operation has been enhanced. SQL Server now uses index insertion and index joint algorithms to implement the use of multiple indexes in a query statement. Shared row identifiers are used to join two indexes on the same table. If there is a clustered index in a table and therefore a clustered key, then all nonclustered indexed leaf nodes on that table use the clustered key as a row locator instead of using a physical record identifier. If there is no clustered index in the table, the nonclustered index continues to point to the data page using the physical record identifier. In the above two cases, the row locator is very stable. When the leaf nodes of the clustered index are separated, the nonclustered index does not need to be modified because the row locator is valid. If there is no clustered index in the table, then the separation of pages will not occur. In previous versions, nonclustered indexes used physical record identifiers such as page number and line number as row locators. For example, if a clustered index (data page) is decomposed, many record rows are moved to a new data page, thus having multiple new physical record identifiers. Then, all nonclustered indexes must be modified using these new physical record identifiers, which will take a lot of time and resources.

The index adjustment wizard is a good tool for skilled users and new users. Skilled users can use the wizard to create a basic index configuration, and then adjust and customize it on the basic index configuration. New users can use this wizard to quickly create optimized indexes.

Previous page12Read the full text