Some experiences to pay attention to in database query performance

1. When optimizing the query, you should try to avoid full table scanning. First of all, you should consider establishing an index on the columns involved in where and order by.

2. Try to avoid null value judgments on fields in the where clause, otherwise it will cause the engine to abandon the use of indexes and perform full table scanning, such as:

select id from t where num is null

You can set the default value 0 on num to ensure that the num column in the table has no null value, and then query it like this:

select id from t where num=0

3. Try to avoid using the != or <> operator in the where clause, otherwise the engine will abandon the use of indexes and perform full table scanning.

4. Try to avoid using or in the where clause to connect conditions, otherwise the engine will give up using the index and perform full table scanning, such as:

select id from t where num=10 or num=20

You can query this way:

select id from t where num=10

union all

select id from t where num=20

and not in should also be used with caution, otherwise it will lead to full table scanning, such as:

select id from t where num in(1,2,3)

For continuous values, if you can use between, don't use in:

select id from t where num between 1 and 3

6. The following query will also result in a full table scan:

select id from t where name like '%abc%'

To improve efficiency, you can consider full-text search.

7. If parameters are used in the where clause, a full table scan will also be caused. Because SQL resolves local variables only at runtime, but the optimizer cannot defer the selection of access plan until runtime; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be used as an input for index selection. As shown in the following statement, a full table scan will be performed:

select id from t where num=@num

You can instead force query to use index:

select id from t with(index(index)) where num=@num

8. Try to avoid expression operations on fields in the where clause, which will cause the engine to abandon the use of indexes and perform full table scanning. like:

select id from t where num/2=100

Should be changed to:

select id from t where num=100*2

9. Try to avoid function operations on fields in the where clause, which will cause the engine to abandon the use of indexes and perform full table scanning. like:

select id from t where substring(name,1,3)='abc'--name id starting with abc

select id from t where datediff(day,createdate,'2005-11-30')=0--‘2005-11-30'Generatedid

Should be changed to:

select id from t where name like 'abc%'

select id from t where createdate>='2005-11-30' and createdate<'2005-12-1'

10. Do not perform functions, arithmetic operations or other expression operations on the left of "=" in the where clause, otherwise the system may not be able to use the index correctly.

11. When using an index field as a condition, if the index is a composite index, then the first field in the index must be used as a condition to ensure that the system uses the index. Otherwise, the index will not be used, and the field order should be consistent with the index order as much as possible.

12. Don’t write some meaningless queries, such as generating an empty table structure:

select col1,col2 into #t from t where 1=0

This type of code will not return any result set, but will consume system resources. It should be changed to this:

create table #t(...)

13. Many times, using exists instead of in is a good choice:

select num from a where num in(select num from b)

Replace with the following statement:

select num from a where exists(select 1 from b where num=)

14. Not all indexes are valid for queries. SQL is used to optimize query based on the data in the table. When there are a large number of data duplications in the index column, SQL query may not use the index. For example, there are fields sex in a table, and almost half of male and female each. Then even if an index is created on the sex, it will not play a role in query efficiency.

15. The more indexes, the better. Although the index can improve the efficiency of the corresponding select, it also reduces the efficiency of insert and update. Because insert or update may rebuild the index, how to build an index needs to be carefully considered and depends on the specific situation. It is best not to have more than 6 indexes in a table. If there are too many, consider whether the indexes created on some infrequently used columns should be necessary.

16. Update clustered index data columns should be avoided as much as possible, because the order of clustered index data columns is the physical storage order of table records. Once the column value changes, it will cause the order of the entire table records to be adjusted, which will consume considerable resources. If the application system needs to frequently update clustered index data columns, then it is necessary to consider whether the index should be built as clustered index.

17. Try to use numeric fields. If only fields containing numerical information are not designed as character types, this will reduce the performance of query and connection and increase storage overhead. This is because the engine compares each character in the string one by one when processing queries and connections, and for numeric types, it only takes one comparison.

18. Use varchar/nvarchar instead of char/nchar as much as possible, because first, the storage space of the longer field is small, which can save storage space. Secondly, for query, the search efficiency in a relatively small field is obviously higher.

19. Do not use select * from t anywhere, use a specific field list instead of "*", and do not return any fields that cannot be used.

20. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, please note that the index is very limited (only the primary key index).

21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

22. Temporary tables are not unusable, and using them appropriately can make certain routines more efficient, for example, when repeated references to a data set in large tables or common tables are required. However, for one-time events, it is best to use an export table.

23. When creating a new temporary table, if the amount of data inserted at one time is large, you can use select into instead of create table to avoid causing a large amount of logs to improve speed; if the amount of data is not large, in order to alleviate the resources of the system table, you should first create table and then insert.

24. If a temporary table is used, all temporary tables must be explicitly deleted at the end of the stored procedure, first truncate table and then drop table, which can avoid long-term locking of the system table.

25. Try to avoid using cursors, because cursors are less efficient. If the data of cursor operations exceeds 10,000 rows, then rewriting should be considered.

26. Before using cursor-based methods or temporary table methods, you should first look for set-based solutions to solve the problem. Set-based methods are usually more efficient.

27. Like temporary tables, cursors are not unusable. Using the FAST_FORWARD cursor for small datasets is usually better than other row-by-row processing methods, especially when several tables have to be referenced to get the required data. Routines that include "total" in the result set are usually faster than they do with cursors. If development time allows, you can try both cursor-based methods and set-based methods to see which method has better effect.

28. Set SET NOCOUNT ON at the beginning of all stored procedures and triggers, and set SET NOCOUNT OFF at the end. There is no need to send a DONE_IN_PROC message to the client after each statement of the stored procedure and trigger.

29. Try to avoid large transaction operations and improve system concurrency capabilities.

30. Try to avoid returning large data volume to the client. If the data volume is too large, you should consider whether the corresponding requirements are reasonable.