SQL Server: Methods to optimize too much data

In SQL Server, when the amount of data increases, the performance of the database may be affected, resulting in slower query speed and longer response time. In order to deal with a large amount of data, the following are some commonly used optimization strategies and cases. As I wrote, it is 15,000 yuan. It is not easy to make original works. Like it first and then read it, and develop good habits:

1. Index optimization

Create an index: Indexes can significantly improve query speed, especially when usedWHERE、JOINandORDER BYwhen clause. Create appropriate indexes for commonly used query fields, especially filter fields.
Select the appropriate index type: Use Clustered Index and Non-clustered Index to optimize query performance. Clustered indexes are suitable for sorting, range queries, etc., while nonclustered indexes are suitable for queries in single or combined columns.
Avoid too much index: Although indexes can improve query performance, excessive indexes will increase the cost of update, insert and delete operations, so the number and performance of indexes must be balanced.

In SQL Server, index optimization is an important means to improve query performance. The following is a specific business scenario. Suppose we have a sales order system and order formOrdersIndex optimization needs to be performed according to different query needs.

Business scenarios

Query Requirement 1: PressCustomerIDandOrderDateQuery order information.
Query Requirement 2: PressProductIDQuery all relevant orders.
Query requirements 3: Query the detailed information of a certain order (byOrderID）。

Based on these requirements, we willOrdersThe table creates an index and shows how to choose the appropriate index type.

1. Create a table`Orders`

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,         -- Primary key index，Automatically create clustered indexes
    CustomerID INT,                  -- clientID
    OrderDate DATETIME,              -- Order date
    ProductID INT,                   -- productID
    TotalAmount DECIMAL(18, 2),      -- Total order amount
    Status VARCHAR(20)               -- Order Status
);

2. Create an index

2.1. Create a Clustered Index

Clustered indexes are usually created based on primary keys or unique constraints. It stores the data in index order, so inOrderIDCreate a clustered index on the speed upOrderIDThe search query.

-- OrderID It's the primary key，Clustered indexes are created by default
-- So in this case there is no need to create additional clustered indexes

2.2. Create a non-clustered index

ForCustomerIDandOrderDateThe query requirements for the combined fields, we can create a composite nonclustered index for it. This can speed up theCustomerIDandOrderDatequery.

CREATE NONCLUSTERED INDEX idx_Customer_OrderDate
ON Orders (CustomerID, OrderDate);

Use scenarios: This index helps speed up the pressCustomerIDandOrderDateThe performance of query, especially when the order data volume is large.

2.3. Create a single column nonclustered index

For query requirements 2, if we need to pressProductIDFind all relevant orders, we canProductIDCreate a single column nonclustered index. This can improve query efficiency.

CREATE NONCLUSTERED INDEX idx_ProductID
ON Orders (ProductID);

Use scenarios: When querying all orders related to a product, the query performance can be significantly improved through this index.

3. Delete redundant indexes

If you find that a query frequently accesses multiple columns and we create multiple single column indexes on these columns, it may cause performance degradation. For example, creating multiple nonclustered indexes for single columns may reduce the efficiency of insertion and update operations. To avoid this, redundant indexes can be checked and deleted regularly.

Suppose we findProductIDandCustomerIDIt often appears together in query conditions, we can consider deletingidx_ProductIDIndex, create a composite index instead.

-- Delete redundant single-column indexes
DROP INDEX idx_ProductID ON Orders;

4. Query optimization

Now, suppose we have the following queries, we will show how to optimize query performance with the created index.

4.1. Press`CustomerID`and`OrderDate`Query

-- use idx_Customer_OrderDate index
SELECT OrderID, ProductID, TotalAmount
FROM Orders
WHERE CustomerID = 1001 AND OrderDate BETWEEN '2024-01-01' AND '2024-12-31';

4.2. Press`ProductID`Query

-- use idx_ProductID index
SELECT OrderID, CustomerID, TotalAmount
FROM Orders
WHERE ProductID = 500;

4.3. Query specific order details

-- according to OrderID Query，Use the default clustered index
SELECT CustomerID, ProductID, TotalAmount, Status
FROM Orders
WHERE OrderID = 123456;

5. Things to note

Maintenance cost of indexes: Although indexes can significantly improve query performance, wheneverINSERT、UPDATEorDELETEDuring operation, the index also needs to be maintained. This increases the cost of the operation. Therefore, the index should not be too many, and it needs to be optimized according to the query needs.
Index overlay: Try to create an overlay index, that is, the index contains all the columns required for the query, which can avoid table return operations during query and improve query efficiency.

Let's summarize

ByOrdersThe table creates the appropriate index and we can significantly optimize query performance. In index optimization, it is necessary to comprehensively consider the query requirements, index type (clustered index, non-clustered index), the number of indexes and their maintenance costs.

2. Query optimization

Optimize SQL Query: Ensure that SQL queries are as efficient as possible. Avoid using it in queriesSELECT *, instead select only the columns you need; avoid duplicate calculations and minimize subqueries.
Use execution plan: Use SQL Server Management Studio (SSMS) execution plan tool to view the execution plan of a query, analyze and optimize the bottlenecks in the query.
Avoid complex nested queries: Complex subqueries can cause performance issues, consider using joins (JOIN) instead.

Query optimization is a process of improving query performance by carefully designing SQL query statements and optimized indexes. According to the business scenario you provide, we will be based on an order systemOrdersTable, displays several common query optimization methods.

Business scenarios

Suppose we have a sales order system,OrdersThe table includes the following fields:

OrderID: Order ID, primary key.
CustomerID: Customer ID.
OrderDate: Order date.
ProductID: Product ID.
TotalAmount: Total order amount.
Status: Order status (such as paid, unpaid, etc.).

We have the following query requirements:

Query all orders of a customer over a certain period of time.
Check the sales of a product in all orders.
Query the details of an order.
Query order information for multiple customers.

1. Query optimization: Query orders by CustomerID and OrderDate

Query requirements:

Query all orders of a customer over a certain period of time.

Query statement:

SELECT OrderID, ProductID, TotalAmount, Status
FROM Orders
WHERE CustomerID = 1001
  AND OrderDate BETWEEN '2024-01-01' AND '2024-12-31';

Optimization suggestions:

Index optimization:forCustomerIDandOrderDateCreate composite indexes because this is a common query pattern. Composite indexes can speed up queries based on these two fields.

CREATE NONCLUSTERED INDEX idx_Customer_OrderDate
ON Orders (CustomerID, OrderDate);

Execution plan optimization:

useEXPLAINorSET STATISTICS IO ONCheck the execution plan and confirm whether the query uses the index.

2. Query optimization: Query all relevant orders by ProductID

Query requirements:

Query all orders for a product.

Query statement:

SELECT OrderID, CustomerID, TotalAmount, Status
FROM Orders
WHERE ProductID = 500;

Optimization suggestions:

Index optimization:forProductIDCreate an index because this field is often used as a query condition.

CREATE NONCLUSTERED INDEX idx_ProductID
ON Orders (ProductID);

Execution plan optimization:

Ensure that queries can be utilizedidx_ProductIDIndex, avoid full table scanning.

3. Query optimization: Query the detailed information of a certain order

Query requirements:

Query the details of an order.

Query statement:

SELECT CustomerID, ProductID, TotalAmount, Status
FROM Orders
WHERE OrderID = 123456;

Optimization suggestions:

Index optimization:becauseOrderIDIt is a primary key field, and SQL Server will automatically create a clustered index. QueryOrderIDWhen fields are used, the query will directly use clustered indexes.

-- Clustered index has been created automatically，No additional creation required

Execution plan optimization:

Make sure that the query only scans one line of data and useOrderIDPrimary key index.

4. Query optimization: Query order information of multiple customers

Query requirements:

Query order information for multiple customers.

Query statement:

SELECT OrderID, CustomerID, ProductID, TotalAmount, Status
FROM Orders
WHERE CustomerID IN (1001, 1002, 1003);

Optimization suggestions:

Index optimization:forCustomerIDCreate an index to quickly filter out the target customer's orders.

CREATE NONCLUSTERED INDEX idx_CustomerID
ON Orders (CustomerID);

Execution plan optimization:

make sureINThe clause was usedidx_CustomerIDIndex to optimize query.

5. Query optimization: Avoid using`SELECT *`

Query requirements:

Query all fields (not recommended, usually used to debug or check table structure).

Query statement:

SELECT * FROM Orders;

Optimization suggestions:

Clearly select the required columns: Avoid usingSELECT *, explicitly list the fields required for the query to avoid reading unnecessary columns.

SELECT OrderID, CustomerID, TotalAmount FROM Orders;

6. Query optimization: Use`JOIN`Perform multi-table query

Query requirements:

Query a customer's order information and related product information. Suppose there is oneProductsTable, includingProductIDandProductName。

Query statement:

SELECT , , 
FROM Orders o
JOIN Products p ON  = 
WHERE  = 1001
  AND  BETWEEN '2024-01-01' AND '2024-12-31';

Optimization suggestions:

Index optimization:forOrdersTableCustomerID、OrderDateandProductIDCreate a composite index forProductsTableProductIDCreate indexes to speed upJOINQuery.

CREATE NONCLUSTERED INDEX idx_Orders_Customer_OrderDate_Product
ON Orders (CustomerID, OrderDate, ProductID);
CREATE NONCLUSTERED INDEX idx_Products_ProductID
ON Products (ProductID);

Execution plan optimization:

Make sure it is used in the execution planJOINrelated indexes to avoid full table scanning.

7. Query optimization: paginated query

Query requirements:

Query customer orders within a certain time period and implement the paging function.

Query statement:

SELECT OrderID, CustomerID, TotalAmount, Status
FROM Orders
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY OrderDate
OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY;

Optimization suggestions:

Index optimization: Make sure to beOrderDateThere are appropriate indexes on it, which can speed up sorting operations.
useOFFSETandFETCHStatements implement pagination query to avoid loading large amounts of data at once.

CREATE NONCLUSTERED INDEX idx_OrderDate
ON Orders (OrderDate);

8. Avoid too many subqueries

Query requirements:

Check the total amount of orders a customer has over a certain period of time.

Query statement:

SELECT CustomerID, 
       (SELECT SUM(TotalAmount) FROM Orders WHERE CustomerID = 1001 AND OrderDate BETWEEN '2024-01-01' AND '2024-12-31') AS TotalSpent
FROM Customers
WHERE CustomerID = 1001;

Optimization suggestions:

Avoid subquery: Try to avoid beingSELECTUse subqueries in the statement, which can be changed toJOINorGROUP BYTo improve efficiency.

SELECT , SUM() AS TotalSpent
FROM Orders o
WHERE  = 1001
  AND  BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY ;

Let's summarize

By optimizing SQL query statements, using indexes rationally, and reducing unnecessary operations, we can significantly improve query performance. Specific practices include:

Create the appropriate index (single-column index and composite index).
Optimize query statements and avoid usingSELECT *and too many subqueries.
Use appropriate paging techniques andJOINOptimize multi-table query.
Analyze query execution plan to ensure efficient query execution.

These optimizations can help SQL Server maintain efficient query performance when facing large amounts of data.

3. Data partitioning and table

Table partition: For very large tables, you can consider using table partitioning. Table partitioning can divide data into multiple physical files according to certain conditions (such as time, ID range, etc.), so that only relevant partitions are accessed during querying, reducing the overhead of full table scanning.
Sharding: Spread data into multiple independent tables or databases, usually based on some rules (such as regions, dates, etc.). Each table contains a subset of data, which can improve query efficiency.

Data partitioning and sharding are key means to optimize database performance, especially when dealing with large amounts of data. Through data partitioning or table division, the pressure of query and writing can be effectively reduced and data access efficiency can be improved. The following are specific code cases based on business scenarios that show how to use data partitioning and table division to optimize SQL Server performance.

Business scenarios

Suppose we have an order system,OrdersThe table records all order information. As order volume increases, querying and maintaining single tables becomes increasingly difficult. Therefore, we need to use partitioning and table partitioning techniques to optimize the performance of the database.

1. Data partitioning (partitioning)

Data partitioning is a logical partition on a single table. It allows a large table to be divided into multiple physical segments (partitions) according to a certain rule (such as time ranges, numerical intervals, etc.). Each partition can be managed independently, and queries can be performed within a specific partition, thereby improving query performance.

Business Requirements

By order date (OrderDate)WillOrdersTable partitioning so that orders quickly locate orders within a specific time period when querying.

step:

Create a partition function and a partitioning scheme.
existOrdersApply partitions on the table.

Create a Partition Function

-- Create partition function：Zoned by year
CREATE PARTITION FUNCTION OrderDatePartitionFunc (DATE)
AS RANGE RIGHT FOR VALUES ('2023-01-01', '2024-01-01', '2025-01-01');

The partition function will be based on the order date (OrderDate) Divide the data into multiple intervals, and the range of each interval is divided by year.

Create a Partition Scheme

-- Create a partition plan：Apply partition functions to physical file groups
CREATE PARTITION SCHEME OrderDatePartitionScheme
AS PARTITION OrderDatePartitionFunc
TO ([PRIMARY], [FG_2023], [FG_2024], [FG_2025]);

This scheme specifies a physical file group for each partition (e.g.PRIMARY、FG_2023wait).

Create a partition table

-- Create a partition table：Apply partitioning scheme
CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
)
ON OrderDatePartitionScheme (OrderDate);

OrdersTable pressOrderDateThe fields are partitioned, and the data will be distributed to different physical file groups according to the date.

Query optimization

-- Query 2024 Year's orders，Query仅会访问相应的分区，提高Query效率
SELECT OrderID, CustomerID, ProductID, TotalAmount
FROM Orders
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-12-31';

Through partitioning, the query will only scan the data of the relevant partitions, thereby improving the query speed.

2. Data sub-table (Sharding)

A subtable is to split the data horizontally into multiple physical tables, each table storing a portion of the data. Common sub-table strategies include sub-table by range, sub-table by hash value, etc. Subtables can significantly improve query performance, but require the management of multiple tables and their relationships.

Business Requirements

according toCustomerIDWillOrdersThe table is divided into tables, and the data is allocated to different tables based on the customer ID.
The range of customer IDs is uniform, so we can use a hash table strategy.

step:

Create multiple subtables.
Process the subtable logic at the application layer.

Create a subtitle

Suppose we decide toOrdersTable pressCustomerIDThe hash value is divided into 4 tables. 4 sub-tables can be created in the following ways:

-- create Orders_1 Table
CREATE TABLE Orders_1
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);
-- create Orders_2 Table
CREATE TABLE Orders_2
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);
-- create Orders_3 Table
CREATE TABLE Orders_3
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);
-- create Orders_4 Table
CREATE TABLE Orders_4
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);

Table logic

At the application layer, we need to implement a subtable routing logic that determines which table to insert data or query data through hash values.

-- Example：according to CustomerID Hash selection table
DECLARE @CustomerID INT = 1001;
DECLARE @TableSuffix INT;
-- Use hashing algorithm to determine table
SET @TableSuffix = @CustomerID % 4;
-- Insert data
IF @TableSuffix = 0
BEGIN
    INSERT INTO Orders_1 (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
    VALUES (123456, 1001, '2024-01-01', 101, 150.00, 'Paid');
END
ELSE IF @TableSuffix = 1
BEGIN
    INSERT INTO Orders_2 (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
    VALUES (123457, 1002, '2024-01-02', 102, 250.00, 'Pending');
END
ELSE IF @TableSuffix = 2
BEGIN
    INSERT INTO Orders_3 (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
    VALUES (123458, 1003, '2024-01-03', 103, 350.00, 'Shipped');
END
ELSE
BEGIN
    INSERT INTO Orders_4 (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
    VALUES (123459, 1004, '2024-01-04', 104, 450.00, 'Delivered');
END

Query logic

In order to query a customer's order, we also need to decide which sub-table to query at the application layer:

-- Query a customer's order
DECLARE @CustomerID INT = 1001;
DECLARE @TableSuffix INT;
SET @TableSuffix = @CustomerID % 4;
-- Query data
IF @TableSuffix = 0
BEGIN
    SELECT * FROM Orders_1 WHERE CustomerID = @CustomerID;
END
ELSE IF @TableSuffix = 1
BEGIN
    SELECT * FROM Orders_2 WHERE CustomerID = @CustomerID;
END
ELSE IF @TableSuffix = 2
BEGIN
    SELECT * FROM Orders_3 WHERE CustomerID = @CustomerID;
END
ELSE
BEGIN
    SELECT * FROM Orders_4 WHERE CustomerID = @CustomerID;
END

3. Selection of partitions and tables

Partition: Applicable to physically divide a table, but still maintain the logical unity of the data. For example, partitioning by time (such as order date) can effectively improve the performance of time range queries.
Table: Suitable for situations where the amount of data is particularly large, split the data into multiple tables to reduce the query pressure of a single table. Usually hash table or range table is used.

Let's summarize

PartitionThis allows you to logically divide a large table, access only relevant partitions during querying, and improve performance.
TableIt splits the data level into multiple physical tables, which are usually used to deal with scenarios with extremely large amounts of data.
Implementing partitioning and table division in SQL Server requires comprehensive consideration of table design, index design and query strategies to ensure data access efficiency and maintenance convenience.

4. Data Archive

Archive old data: For data that is no longer queried, it can be archived into an independent historical table or database, thereby reducing the burden on the primary database. Only recent data is retained in the main table to optimize query performance.
Compress old data: Archive data can be stored through compression technology, saving storage space.

Data archiving refers to removing historical data that is no longer frequently accessed from the primary database and storing it in an archive system or table to improve the performance of the primary database. Data archives are usually used for old data, history, etc. that are no longer active but need to be retained.

Business scenarios

Suppose we have an order system,OrdersThe table records all order information. As time goes by, the amount of order data increases sharply, but in actual business, the frequency of order data query exceeding a certain period of time decreases. To improve database performance, we decided to remove order data for more than 1 year from the main table and archive it into the archive table.

step:

Create the main table (Orders) and archive table (ArchivedOrders）。
Regularly, order data exceeding 1 year fromOrdersTable move toArchivedOrderssurface.
Ensure that queries to archive data do not affect the performance of the primary table.

1. Create the main table and archive table

-- Create a master order table Orders
CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);
-- Create an archive table ArchivedOrders
CREATE TABLE ArchivedOrders
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);

2. Archive operation (move orders over 1 year to the archive table)

In order to periodically move expired orders to the archive table, you can use a timed task such as a SQL Server Agent job to perform this operation.

-- Will exceed 1 Year's order data from Orders Table Move to ArchivedOrders surface
INSERT INTO ArchivedOrders (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
SELECT OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status
FROM Orders
WHERE OrderDate &lt; DATEADD(YEAR, -1, GETDATE());
-- delete Orders surface中超过 1 Year's order data
DELETE FROM Orders
WHERE OrderDate &lt; DATEADD(YEAR, -1, GETDATE());

This code willOrdersIn the tableOrderDateOrder data less than the current date is 1 year inserted intoArchivedOrderstable, and transfer these data fromOrdersDelete from the table.

3. Timed archive tasks (using SQL Server Agent)

We can use the SQL Server Agent to create a timed task that performs data archiving operations regularly. For example, run once a day to archive order data from 1 year ago:

-- exist SQL Server Agent Create a job to perform an archive operation
USE msdb;
GO
EXEC sp_add_job
    @job_name = N'ArchiveOldOrders';
GO
EXEC sp_add_jobstep
    @job_name = N'ArchiveOldOrders',
    @step_name = N'ArchiveOrdersStep',
    @subsystem = N'TSQL',
    @command = N'
        INSERT INTO ArchivedOrders (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
        SELECT OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status
        FROM Orders
        WHERE OrderDate &lt; DATEADD(YEAR, -1, GETDATE());
        DELETE FROM Orders
        WHERE OrderDate &lt; DATEADD(YEAR, -1, GETDATE());
    ',
    @database_name = N'VGDB';
GO
-- Set job schedule，For example, run once a day
EXEC sp_add_schedule
    @schedule_name = N'ArchiveOrdersDaily',
    @enabled = 1,
    @freq_type = 4, -- every day
    @freq_interval = 1, -- every day执行一次
    @active_start_time = 0;
GO
EXEC sp_attach_schedule
    @job_name = N'ArchiveOldOrders',
    @schedule_name = N'ArchiveOrdersDaily';
GO
-- Start the job
EXEC sp_start_job @job_name = N'ArchiveOldOrders';
GO

4. Query archived data

Archived data can still be queried, but it will not affect the query performance of the main table. In order to find a customer's historical order, you can query the archive table:

-- Query a customer's historical order
SELECT OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status
FROM ArchivedOrders
WHERE CustomerID = 1001
ORDER BY OrderDate DESC;

5. Optimization and precautions

Archive Strategy: The appropriate time range can be selected based on actual business needs (for example, 3 months, 6 months, or 1 year). It can be adjustedWHEREConditions to modify the archive rules.
Performance optimization: Regular archive operations can reduce the burden on the main table and improve query performance. Regularly deleting old data can also reduce the storage space of the main table.
Backup and restore of archived data: Archived data also needs to be backed up regularly and can be restored when needed. Make sure that the archive table also includes sufficient backup policies.

6. Another option to archive and clean data: soft deletion

In some cases, the data is not completely deleted from the database after archive, but is marked as "archived" or "deleted". The advantage of this method is that data can be recovered at any time without loss.

-- exist Orders Added in the table Archived Logo
ALTER TABLE Orders
ADD Archived BIT DEFAULT 0;
-- Mark data as archived
UPDATE Orders
SET Archived = 1
WHERE OrderDate &lt; DATEADD(YEAR, -1, GETDATE());
-- Query unarchived data
SELECT * FROM Orders WHERE Archived = 0;
-- Query archived data
SELECT * FROM Orders WHERE Archived = 1;

In this way, the archived orders remain in the main table, but throughArchivedFields can distinguish between archived and unarchived orders.

Let's summarize

Data archiving operations are an effective strategy for managing large-scale databases. By regularly migrating historical data from the primary database table to the archive table, the query performance of the database can be significantly improved while ensuring that historical data is retained for future queries and audits.

5. Storage and hardware optimization

Disk I/O Optimization: The performance of the database is limited by disk I/O, especially when processing large amounts of data. Using SSD storage provides faster I/O performance than traditional hard drives (HDDs).
Increase memory: Increasing the memory of SQL Server can make the database buffer pool larger, thereby reducing disk I/O and improving query performance.
Configuring with RAID: Use RAID 10 or other RAID configuration to ensure efficient and reliable data reading and writing.

Storage and hardware optimization are key parts of improving database performance, especially in environments where large-scale data processing is performed. Through reasonable hardware resource allocation, storage structure optimization and database configuration, performance can be significantly improved. Below we will explain how to optimize SQL Server at the storage and hardware levels for an e-commerce platform's order system.

Business scenarios:

Suppose you have an e-commerce platform where order data is stored in SQL Server and the number of orders is increasing, resulting in a degradation in query performance. In this scenario, we can perform storage and hardware optimization through the following methods.

Optimization strategy:

Disk I/O Optimization：
- Use SSD to replace traditional hard drives (HDDs) to improve read and write speeds.
- Store data files, log files, and temporary files on different physical disks.
Table and index storage：
- Use appropriate storage formats and file organization methods such as partition tables and table compression.
- Place frequently accessed tables and indexes on high-performance disks.
Hardware resource configuration：
- Increase memory to support more data cache and reduce disk access.
- Use multi-core CPUs to improve the processing power of concurrent queries.
Data compression：
- Enable data compression in SQL Server to reduce disk space usage and improve I/O performance.

1. Create tables and optimize storage

First, we create an order table and the order tableOrderIDColumn creates a clustered index.

-- create Orders Tables and optimize storage
CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY CLUSTERED,  -- Clustered Index
    CustomerID INT,
    OrderDate DATETIME,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
) 
ON [PRIMARY]
WITH (DATA_COMPRESSION = PAGE);  -- Enable data page compression to save space
-- 启用非Clustered Index，Used to optimize query
CREATE NONCLUSTERED INDEX idx_OrderDate
ON Orders(OrderDate)
WITH (DATA_COMPRESSION = PAGE);  -- Also enable data compression

By usingDATA_COMPRESSION = PAGE, We enabled SQL Server's data compression feature to save storage space and improve disk I/O performance.PAGECompression ratioROWCompression is more efficient and suitable for large data tables.

2. Partition table optimization

As the order data volume continues to increase, we can partition the order table. according toOrderDateThe column divides the data into different partitions to reduce the scan range during query and improve query efficiency.

-- Create partition function
CREATE PARTITION FUNCTION pf_OrderDate (DATETIME)
AS RANGE RIGHT FOR VALUES ('2022-01-01', '2023-01-01', '2024-01-01');
-- Create a partition plan
CREATE PARTITION SCHEME ps_OrderDate
AS PARTITION pf_OrderDate
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]);
-- Create a partition table
CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY CLUSTERED, 
    CustomerID INT,
    OrderDate DATETIME,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
) 
ON ps_OrderDate(OrderDate);  -- according to OrderDate Columns are partitioned

In this code, we useOrderDateThe years of the column are divided into different partitions (such as order data for 2022, 2023 and 2024). This allows queries to perform better over a specific time frame, because SQL Server only needs to scan the data of the relevant partitions, not the entire table.

3. Hardware optimization configuration

3.1. Make sure to use SSD disks

SSD disks read and write faster than traditional hard disks, so storing the database's main data files, log files, and temporary files on different disks (preferably SSDs) can improve performance.

-- Will SQL Server Data File (.mdf) Stored in SSD disk
-- Will日志文件 (.ldf) Stored in SSD disk
-- Will临时数据库文件 (.ndf) Stored in SSD disk

3.2. Configuring SQL Server Memory

Set SQL Server's memory to maximize so that more data can be cached in memory, thereby reducing disk I/O. Here is how to set the maximum memory configuration for SQL Server:

-- View current memory settings
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'max server memory (MB)';
 -- Set the maximum memory to 16 GB
 EXEC sp_configure 'max server memory (MB)', 16384;
RECONFIGURE;

With proper memory configuration, SQL Server can cache more data in memory, reducing access to disk and improving query response speed.

3.3. Configure SQL Server parallel processing

If the server has a multi-core CPU, you can set up SQL Server to allow more parallel query operations, thereby improving the processing power of multi-threaded queries.

-- View the current parallelism configuration
EXEC sp_configure 'max degree of parallelism';
-- Set as 4，The most allowed 4 indivual CPU Parallel processing of queries
EXEC sp_configure 'max degree of parallelism', 4;
RECONFIGURE;

4. Disk I/O Optimization: Store data files, log files and temporary files separately

Disk I/O is one of the bottlenecks in database performance. To improve the performance of the database, it is best to store data files, log files and temporary files on different physical disks.

-- Data File (.mdf) Stored on disk A
-- Log files (.ldf) Stored on disk B
-- Temporary database files (.ndf) Stored on disk C

5. Data backup and recovery optimization

Ensure regular backup of data and use incremental backups, differential backups, etc. to reduce the disk burden during backup.

-- Make a full backup
BACKUP DATABASE VGDB TO DISK = 'D:\Backups\VGDB_full.bak';
-- Make a differential backup
BACKUP DATABASE WGDB TO DISK = 'D:\Backups\VGDB_diff.bak' WITH DIFFERENTIAL;
-- Conduct transaction log backup
BACKUP LOG VGDB TO DISK = 'D:\Backups\VGDB_log.trn';

This approach allows you to quickly recover data in the event of a system crash while reducing the impact on hard disk I/O performance during backup.

6. Monitoring and Maintenance

Regularly monitor SQL Server performance and make corresponding adjustments based on hardware and storage requirements. Monitor I/O performance, query execution plans, index usage, etc. through SQL Server's dynamic management view (DMV).

-- View disk I/O situation
SELECT * FROM sys.dm_io_virtual_file_stats(NULL, NULL);
-- View the cache of query execution plan
SELECT * FROM sys.dm_exec_query_stats;
-- Check the current index usage
SELECT * FROM sys.dm_db_index_usage_stats;

Let's summarize

The performance of SQL Server databases can be significantly improved through storage and hardware optimization. Key optimization measures include using SSD disks, storing data files, log files and temporary files separately, enabling data compression, using partitioned tables to improve query efficiency, and adjusting memory and parallel processing configurations. Regular maintenance and monitoring can also help you identify performance bottlenecks and make corresponding adjustments.

6. Database parameters and configuration optimization

Adjust the maximum number of concurrent connections: Ensure that SQL Server is configured with a maximum number of concurrent connections to avoid performance degradation when there are too many connections.
Set appropriate memory limits: Configure enough memory for SQL Server (max server memory) to avoid memory overflow or excessive use of disk swap.
Automatic update of statistics: Ensure SQL Server automatically updates queries statistics (AUTO_UPDATE_STATISTICS) so that the query optimizer can select the optimal execution plan.

Database parameters and configuration optimization are important steps to ensure that the database system performance reaches its best state. In high concurrency and high load scenarios, reasonable configuration can significantly improve database performance and reduce response time and latency. The following is a complete code case based on the business scenario of an e-commerce platform order system, how to improve performance by optimizing the parameters and configuration of the database.

Business scenarios:

Assuming that the order volume of e-commerce platforms is very large and the system processes millions of orders every day, the performance and response speed of the database are the key to the normal operation of the system. To ensure database performance, it is critical to perform parameter and configuration optimization in SQL Server.

Optimization strategy:

Adjust memory configuration: Reduce disk I/O by configuring SQL Server to use more memory to cache data.
Set maximum parallelism: Adjust the parallel query processing capability of SQL Server based on the number of CPU cores.
Optimize disk and storage configuration: Ensure that log files, data files and temporary files are stored separately.
Enable automatic database optimization: Ensure that the database can automatically defragment, update statistics and other tasks.
Adjust transaction log and recovery mode: Ensure that the database can be recovered quickly in the event of a failure.

1. Adjust the memory configuration

Memory configuration optimization is a key part of improving SQL Server performance. By increasing the maximum memory of SQL Server, it is guaranteed that query operations will not cause performance problems due to disk I/O bottlenecks.

-- Check the current maximum memory configuration
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'max server memory (MB)';
 -- Set the maximum memory to 16 GB
 EXEC sp_configure 'max server memory (MB)', 16384;  -- 16 GB
RECONFIGURE;

In the above code, we set the maximum memory of SQL Server to 16 GB. Properly configuring memory can improve query performance and reduce disk access.

2. Set the maximum parallelism

SQL Server can use multiple CPU cores for parallel query processing. By reasonably setting the parallelism, the processing capability of large queries can be improved.

-- View the current maximum parallelism setting
EXEC sp_configure 'max degree of parallelism';
-- Set the maximum parallelism 4（Applicable to 4 nuclear CPU The machine）
EXEC sp_configure 'max degree of parallelism', 4;
RECONFIGURE;

With this setup, SQL Server can process in parallel with up to 4 CPU cores when querying. If your server has more cores, you can adjust this parameter according to the actual situation.

3. Adjust transaction log and recovery mode

For e-commerce platforms, the optimization of transaction logs is crucial. Ensure that log files can be processed efficiently when performing large-scale transaction operations and that recovery mode meets business needs.

-- View the recovery mode of the database
SELECT name, recovery_model_desc
FROM 
WHERE name = 'VGDB';
-- Set recovery mode to simple recovery mode
ALTER DATABASE VGDB
SET RECOVERY SIMPLE;

For databases that do not require a full backup, using simple recovery mode can reduce the growth of log files and reduce disk I/O pressure.

4. Configure automatic database optimization

Ensure that the database can perform automatic optimization tasks regularly, such as rebuilding indexes, updating statistics, etc. Regular optimization can improve the query performance of the database and avoid fragmentation problems.

-- Enable automatic update statistics
EXEC sp_configure 'auto update statistics', 1;
RECONFIGURE;
-- Enable automatic creation of statistics
EXEC sp_configure 'auto create statistics', 1;
RECONFIGURE;

By enabling automatic updates and automatic creation of statistics, you can ensure that SQL Server can use the latest execution plan when executing queries, reducing the burden on the query optimizer.

5. Configure disk and storage

Make sure that SQL Server's data files, log files, and temporary files are stored on different disks, especially on high-speed disks such as SSDs.

-- Put the data file (.mdf) Stored on disk A（SSD）
-- Translate log files (.ldf) Stored on disk B（SSD）
-- Convert temporary database files (.ndf) Stored on disk C（SSD）

By storing data files, log files and temporary files on different disks, disk I/O competition can be avoided and the overall performance of the database can be improved.

6. Enable database compression

For e-commerce platforms that need to store large amounts of data, enabling data compression can reduce storage space and improve query performance, especially on disk I/O.

-- Enable table compression
ALTER TABLE Orders REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);
-- Enable index compression
ALTER INDEX ALL ON Orders REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);

By enabling data compression, we can effectively save storage space, reduce disk I/O operations, and improve query speed.

7. Configure automatic maintenance tasks

SQL Server provides automatic maintenance tasks, such as index reconstruction, database defragmentation, etc., which can automatically execute these tasks through SQL Server Agent timing tasks to keep the database running efficiently.

-- Create a job that executes regularly，Perform index reconstruction tasks
EXEC sp_add_job @job_name = 'RebuildIndexes', @enabled = 1;
EXEC sp_add_jobstep @job_name = 'RebuildIndexes', 
    @step_name = 'RebuildIndexStep', 
    @subsystem = 'TSQL', 
    @command = 'ALTER INDEX ALL ON Orders REBUILD',
    @retry_attempts = 3, 
    @retry_interval = 5;
-- Set the job running frequency：Every morning 2 Click to execute
EXEC sp_add_schedule @schedule_name = 'RebuildIndexSchedule',
    @enabled = 1,
    @freq_type = 4, 
    @freq_interval = 1, 
    @active_start_time = 20000;
EXEC sp_attach_schedule @job_name = 'RebuildIndexes', @schedule_name = 'RebuildIndexSchedule';

This assignment will be performed at 2 a.m. every day, rebuildOrdersAll indexes on the table, thus avoiding degradation of query performance due to index fragmentation.

8. Enable instant log backup

For production environments, especially e-commerce platforms, it is crucial to ensure timely execution of log backups. Enable log backups ensures rapid recovery in the event of a database failure.

-- Set up transaction log backup
BACKUP LOG VGDB TO DISK = 'D:\Backups\YourDatabase_log.trn';

By performing transaction log backups regularly, you can ensure that the database can be restored to the latest state in the event of a failure.

9. Enable database caching

SQL Server caches query results and data pages, optimizing performance by adjusting cache policies.

-- Check the number of cached pages
DBCC SHOW_STATISTICS('Orders');
-- Force clear cache（Sometimes it can be used for testing）
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

In daily operations, we do not recommend clearing the cache frequently, but can clear the cache when needed to test performance optimization.

Let's summarize

By optimizing the configuration and parameters of SQL Server, the database performance of the e-commerce platform can be significantly improved. Key optimization measures include adjusting memory and parallelism, optimizing disk storage and log configuration, enabling data compression, performing automatic database optimization tasks regularly, configuring database compression and regular backups, etc. Properly configure the database according to business needs and hardware resources to ensure that the database can operate stably and efficiently in a high-concurrency and high-load environment.

7. Batch data processing

Batch insert/update operations: When processing large amounts of data, batch insertion or update operations can be used instead of line by line. This can significantly improve the loading speed of data.
Avoid big business: For large data modifications, avoid using large transactions, because large transactions may lead to problems such as lock competition and excessive log files. Use small batch transactions to operate.

Batch data processing is inevitable in large-scale applications, especially for business scenarios such as e-commerce platforms and financial systems, which usually require large-scale orders, user information processing, etc. Batch operations can significantly improve data processing efficiency, but they also require careful design to ensure performance and stability.

Business scenarios:

Suppose that in an e-commerce platform, order information needs to be processed in batches, such as batch update of order status, batch deletion of failed orders, batch insertion of order data, etc. By designing appropriate batch operations, the number of database accesses for a single operation can be effectively reduced and the system's response capabilities can be improved.

Optimization solution:

Insert data in batches:passBULK INSERTorINSERT INTOMulti-line insertion method reduces performance bottlenecks caused by multiple separate insertion operations.
Batch update data:useUPDATEOperation updates multiple records at one time.
Batch deletion of data: Batch delete expired orders, or batch delete invalid user information.

The following are specific code cases for batch data processing of SQL Server.

1. Batch insert data

Batch insertion can reduce the time overhead of a large number of individual insert operations, throughINSERT INTOThe statement inserts multiple pieces of data at a time.

Example: Bulk insertion of order data

-- Assumptions Orders The table structure is as follows：OrderID INT, CustomerID INT, OrderDate DATETIME, OrderStatus VARCHAR(20)
DECLARE @OrderData TABLE (OrderID INT, CustomerID INT, OrderDate DATETIME, OrderStatus VARCHAR(20));
-- Insert order data into temporary table
INSERT INTO @OrderData (OrderID, CustomerID, OrderDate, OrderStatus)
VALUES
    (1, 101, '2024-11-01', 'Pending'),
    (2, 102, '2024-11-02', 'Shipped'),
    (3, 103, '2024-11-03', 'Delivered'),
    (4, 104, '2024-11-04', 'Cancelled');
-- Insert data in batches Orders surface
INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderStatus)
SELECT OrderID, CustomerID, OrderDate, OrderStatus
FROM @OrderData;

In this example, we first insert the data into the temporary table.@OrderData, and then passINSERT INTO SELECTBatch insertion of statementsOrderssurface. This method can greatly reduce the number of database accesses.

2. Batch update data

Batch update operations are often used to modify certain fields in multiple records to avoid multiple individual updates.

Example: Batch update order status

Assuming that all unshipped orders need to be updated in batches to "Shipped", it can be implemented through the following SQL:

-- Batch update order status
UPDATE Orders
SET OrderStatus = 'Shipped'
WHERE OrderStatus = 'Pending' AND OrderDate &lt; '2024-11-01';

This operation will update all records that meet the criteria at once to avoid performance problems caused by multiple individual update operations.

3. Batch delete data

In some scenarios, we need to batch delete certain expired or invalid data. For example, delete an expired order that was 30 days ago.

Example: Batch Delete Expired Orders

-- Delete expired orders
DELETE FROM Orders
WHERE OrderDate &lt; DATEADD(DAY, -30, GETDATE()) AND OrderStatus = 'Completed';

In this example, we delete all orders that have been completed and have an order date of more than 30 days. This batch deletion operation is much more efficient than deleting one by one.

4. Batch processing logic optimization

Sometimes the amount of data in batch operations is very large, and direct processing may lead to performance issues or database lock contention. It is possible to consider performing operations in batches to reduce the burden on the system.

Example: Process order data by batch

DECLARE @BatchSize INT = 1000;
DECLARE @StartRow INT = 0;
DECLARE @TotalRows INT;
-- Calculate the total number of records
SELECT @TotalRows = COUNT(*) FROM Orders WHERE OrderStatus = 'Pending';
-- Cyclic batch processing of data
WHILE @StartRow &lt; @TotalRows
BEGIN
    -- Batch updates 1000 Data
    UPDATE TOP (@BatchSize) Orders
    SET OrderStatus = 'Shipped'
    WHERE OrderStatus = 'Pending' AND OrderDate &lt; '2024-11-01' AND OrderID &gt; @StartRow;
    -- Update the number of processed rows
    SET @StartRow = @StartRow + @BatchSize;
END

By processing in batches (1000 records are processed at a time), performance bottlenecks or database lock problems caused by processing large amounts of data at one time can be avoided. Suitable for situations where large amounts of records need to be updated in batches.

5. Use transactions to ensure data consistency

For batch operations, transactions are usually required to ensure data consistency, that is, either all succeed or all fail.

Example: Bulk insertion of orders and use transactions

BEGIN TRANSACTION;
BEGIN TRY
    -- Assumptions Orders Table structure：OrderID INT, CustomerID INT, OrderDate DATETIME, OrderStatus VARCHAR(20)
    DECLARE @OrderData TABLE (OrderID INT, CustomerID INT, OrderDate DATETIME, OrderStatus VARCHAR(20));
    -- Bulk insertion of order data
    INSERT INTO @OrderData (OrderID, CustomerID, OrderDate, OrderStatus)
    VALUES
        (5, 105, '2024-11-05', 'Pending'),
        (6, 106, '2024-11-06', 'Pending');
    INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderStatus)
    SELECT OrderID, CustomerID, OrderDate, OrderStatus
    FROM @OrderData;
    -- Submit transactions
    COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    -- Error handling and rollback transactions
    ROLLBACK TRANSACTION;
    PRINT 'Error occurred: ' + ERROR_MESSAGE();
END CATCH;

In this example, the batch insertion operation is included in a transaction, ensuring the atomicity of the insertion operation, i.e. either all succeed or all fail. If an error occurs during execution, the transaction will be rolled back to avoid data inconsistencies.

Let's summarize

Batch data processing is an effective means to improve SQL Server performance, especially in business scenarios such as e-commerce platforms with huge data volumes. By rationally using batch insertion, batch update and batch deletion operations, the database processing efficiency can be greatly improved and the number of I/O operations and lock competition in the database can be reduced. When performing batch operations, remember to ensure data consistency through transactions, and batch processing can further optimize the processing performance of large-scale data.

8. Clean up useless data

Delete expired data: Regularly clean up expired or no longer needed data, reducing the size of the database and the complexity of query.
Clean up database fragments: As data is added and deleted, the fragmentation of tables and indexes will increase, affecting performance. Rebuild or reorganize indexes regularly to reduce fragmentation.

Cleaning up useless data is a common task in database maintenance, especially when dealing with historical data, expired records, or redundant data. Regular cleaning of useless data not only saves storage space, but also improves database performance and avoids unnecessary impacts on queries, indexes, etc.

Business scenarios:

Suppose we are in an e-commerce platform, the user's order data will generate a large number of records every year. In order to avoid excessive order tables and that order records that are no longer in use (such as orders 3 years ago) take up a lot of storage space, we need to clean up these expired order data regularly.

Optimization solution:

Delete expired data: Regularly delete order data that exceeds a certain period of time (such as orders 3 years ago).
Archive expired data: Move expired order data to a historical table or external storage, retaining the necessary historical information.

Code Example

1. Regularly delete expired data

Assume oursOrdersThe table has fieldsOrderDateTo record the order creation time,OrderStatusTo identify the order status. We can clean up completed or cancelled orders 3 years ago each month.

-- delete 3 Orders completed or cancelled a year ago
DELETE FROM Orders
WHERE OrderDate &lt; DATEADD(YEAR, -3, GETDATE()) 
    AND OrderStatus IN ('Completed', 'Cancelled');

In this example,DATEADD(YEAR, -3, GETDATE())The current date 3 years ago will be calculated, all before this date and the status is'Completed'or'Cancelled'The order will be deleted.

2. Regularly archive expired data

If deleting data does not meet business needs, you can choose to archive the data. For example, transfer orders from 3 years ago toArchivedOrderssurface.

-- Will 3 Completed or cancelled orders from years ago move to ArchivedOrders surface
INSERT INTO ArchivedOrders (OrderID, CustomerID, OrderDate, OrderStatus)
SELECT OrderID, CustomerID, OrderDate, OrderStatus
FROM Orders
WHERE OrderDate &lt; DATEADD(YEAR, -3, GETDATE()) 
    AND OrderStatus IN ('Completed', 'Cancelled');
-- Delete archived orders
DELETE FROM Orders
WHERE OrderDate &lt; DATEADD(YEAR, -3, GETDATE()) 
    AND OrderStatus IN ('Completed', 'Cancelled');

First insert the order data that meets the criteria intoArchivedOrderstable, then delete the originalOrdersThese data in the table. This keeps the main table clean, reduces storage pressure, and preserves historical data.

3. Use triggers to automatically clean useless data

To automate cleaning operations, a database trigger can be used, for example, check whether the data expires every time the data is inserted, and if it expires, the cleaning operation will be triggered. Triggers can perform cleanup tasks periodically.

-- Create a trigger，Check and delete every day 3 Orders from the year before
CREATE TRIGGER CleanOldOrders
ON Orders
AFTER INSERT, UPDATE
AS
BEGIN
    -- Clean up expired orders：delete 3 Completed or cancelled orders from years ago
    DELETE FROM Orders
    WHERE OrderDate &lt; DATEADD(YEAR, -3, GETDATE()) 
        AND OrderStatus IN ('Completed', 'Cancelled');
END;

This trigger will beOrdersThe table fires every time the insertion or update operation is performed, automatically checking and cleaning out expired orders.

4. Clean useless data in batches

If the order data volume is very large, deleting it directly may cause performance bottlenecks or database locking issues. In this case, data can be deleted in batches to reduce the load on a single deletion operation.

DECLARE @BatchSize INT = 1000;
DECLARE @StartRow INT = 0;
DECLARE @TotalRows INT;
-- Calculate the number of records to be deleted
SELECT @TotalRows = COUNT(*) FROM Orders
WHERE OrderDate &lt; DATEADD(YEAR, -3, GETDATE()) 
    AND OrderStatus IN ('Completed', 'Cancelled');
-- Delete in batches
WHILE @StartRow &lt; @TotalRows
BEGIN
    -- Batch Delete 1000 Data
    DELETE TOP (@BatchSize) FROM Orders
    WHERE OrderDate &lt; DATEADD(YEAR, -3, GETDATE()) 
        AND OrderStatus IN ('Completed', 'Cancelled')
        AND OrderID &gt; @StartRow;
    -- Update the number of deleted rows
    SET @StartRow = @StartRow + @BatchSize;
END

By processing deletion operations in batches, deleting a small number of records at a time, reducing the impact on database performance and avoiding long-term locking of tables.

5. Use the job scheduler to clean useless data regularly

If you are using SQL Server, you can perform cleanup tasks regularly using the job scheduler (SQL Server Agent). First, you can create a stored procedure to perform data cleaning operations.

CREATE PROCEDURE CleanOldOrders
AS
BEGIN
    DELETE FROM Orders
    WHERE OrderDate < DATEADD(YEAR, -3, GETDATE()) 
        AND OrderStatus IN ('Completed', 'Cancelled');
END;

Then, set up periodic jobs in SQL Server Management Studio (for example, running the stored procedure every day at midnight) to ensure that useless data is cleaned up regularly.

Let's summarize

Cleaning up useless data not only helps save storage space, but also improves database performance. Depending on actual business needs, we can choose to delete, archive or batch processing to clean up the data. Especially for tables with large data volumes, batch cleaning and regular job scheduling can effectively reduce the burden on the system.

9. Use Cache

Cache common query results: For high-frequency queries, the query results can be cached into memory to avoid searching in the database for each query.
Application layer cache: Use cache systems such as Redis or Memcached to cache some common data in memory, thereby reducing the frequency of database access.

In actual business, caching is a common means to improve system performance, especially for hotspot data accessed at high frequency. By storing it in the cache, the number and pressure of database queries can be reduced and the response speed can be improved.

Business scenarios

Suppose we have an e-commerce platform where users frequently query the basic information of the product (such as price, inventory, description, etc.) when browsing product details. Since product information changes less and query requests are frequent, caching of product information can effectively improve the performance of the system.

We use Redis as a cache database. The common practice is: when querying a certain product, first check whether the product's details exist in the cache. If it exists, directly return the data in the cache; if it does not exist, query it from the database and store the query results in the cache for the next time.

Solution

Use Redis to store product information.
Set the appropriate expiration time (TTL, Time To Live) to avoid cached data expiration.
Use appropriate cache update policies (for example: update the cache every time the product information is updated).

Code Example

1. Set up Redis cache

First, use Redis's client library (such asredis-py) Connect to the Redis service. Assume that the product information table isProducts, there are fieldsProductID, ProductName, Price, Stock, Description。

# Install the Redis clientpip install redis

2. Product query and cache logic

import redis
import 
import json
# Connect Redisredis_client = (host='localhost', port=6379, db=0, decode_responses=True)
# Connect to MySQL databasedef get_db_connection():
    return (
        host="localhost",
        user="root",
        password="password",
        database="ecommerce"
    )
# Get product detailsdef get_product_details(product_id):
    # Check cache    cached_product = redis_client.get(f"product:{product_id}")
    if cached_product:
        print("Get product information from cache")
        return (cached_product)  # Deserialize JSON data    # If there is no cache, query the database    print("Get product information from the database")
    connection = get_db_connection()
    cursor = (dictionary=True)
    ("SELECT * FROM Products WHERE ProductID = %s", (product_id,))
    product = ()
    # If the product exists, cache it in Redis    if product:
        redis_client.setex(f"product:{product_id}", 3600, (product))  #Cache for 1 hour    ()
    ()
    return product
# Update product information and update cachedef update_product_details(product_id, name, price, stock, description):
    # Update the database    connection = get_db_connection()
    cursor = ()
    ("""
        UPDATE Products
        SET ProductName = %s, Price = %s, Stock = %s, Description = %s
        WHERE ProductID = %s
    """, (name, price, stock, description, product_id))
    ()
    ()
    ()
    # Update cache    updated_product = {
        "ProductID": product_id,
        "ProductName": name,
        "Price": price,
        "Stock": stock,
        "Description": description
    }
    redis_client.setex(f"product:{product_id}", 3600, (updated_product))  #Cache for 1 hour# Example: Query Product 101 informationproduct_info = get_product_details(101)
print(product_info)
# Example: Updated information for product 101update_product_details(101, "New Product Name", 199.99, 50, "Updated description")

Code description

Connect Redis and MySQL:useredis-pyConnect to Redis, useConnect to the MySQL database.
Query product:existget_product_detailsIn the method, we first query the Redis cache to see if the product information has been cached. If there is in the cache, the data in the cache will be returned directly; if there is no in the cache, it will be queried from the MySQL database and cache the query results to Redis.
Update product information:When product information changes (such as product name, price, inventory, etc. update), after we update product information in the database, we also update the Redis cache to ensure the latest cache data.
Cache setting expiration time:usesetexMethod caches product information to Redis and sets the expiration time (TTL) for the cached data. This can prevent the existence of cached expired data.

Further optimization

Cache penetration:When querying, in addition to checking whether the cache exists, you can also add some mechanisms to prevent cache penetration, such as checking whether the product exists when querying the database. If the product does not exist, you can set it toNoneOr empty value to avoid querying the database multiple times.
Cache Elimination Strategy:Redis has a variety of cache elimination strategies (such as LRU, LFU), which can configure the cache policy of Redis instances according to actual business needs to ensure that hotspot data can be kept in cache for a long time.
Asynchronous update cache:In high concurrency scenarios, the operation of updating caches may cause performance problems. Queues and asynchronous processing can be used to optimize the timing of cache updates to avoid frequent updates to caches.

Let's summarize

By using Redis cache, e-commerce platforms can effectively improve the performance of querying product information and reduce the burden on database. According to business needs, we can further optimize the caching strategy and update mechanism.

10. Parallel query and concurrency

Enable parallel query:SQL Server allows multiple CPU cores to be used in queries for parallel processing. Adjust the settings of parallel query appropriately (such asmax degree of parallelism) can improve query performance, especially when processing large amounts of data.
Optimize lock strategy: Ensure that the database locking strategy is reasonable and avoid long-term lock competition. Row-level locks instead of table-level locks can be used to reduce blockage.

In high concurrency scenarios, using parallel query can significantly improve the speed of data query. The core idea of parallel query is to split complex queries into multiple subtasks, and use multiple CPU cores to process these subtasks simultaneously, thereby improving overall query performance. Concurrency refers to switching between multiple tasks, so that the CPU can be utilized more efficiently. In some scenarios, high performance can be achieved by performing multiple query tasks concurrently.

Business scenarios

Suppose we have an e-commerce platform that stores a large amount of order data. When users query order data, complex query operations such as joining multiple tables and filtering multiple conditions may be involved. In order to improve query performance, we can optimize different query tasks through parallel query and concurrency.

For example, when querying order data, the query conditions include order status, order date range, user ID, etc. We split the query into multiple parallel queries, query different conditions separately, and then merge the results to return.

Solution

Parallel query:Split the query task into multiple subtasks, and use multiple threads or multiple processes to execute each subtask in parallel.
Concurrent query:Use asynchronous IO or thread pool to perform multiple query operations concurrently.

We will use Python'sThe library implements parallel query and uses the MySQL database to perform query operations.

Code Example

1. Parallel query

We divide the query conditions into multiple parts and perform query operations in parallel. For example: Check the order status separatelyCompletedandPendingOrder data, parallel query.

# Install the MySQL client librarypip install mysql-connector-python

import 
from  import ThreadPoolExecutor
import time
# Connect to MySQL databasedef get_db_connection():
    return (
        host="localhost",
        user="root",
        password="123123",
        database="VGDB"
    )
# Execute query: Query orders whose order status is in the specified statedef query_orders_by_status(status):
    connection = get_db_connection()
    cursor = (dictionary=True)
    query = "SELECT * FROM Orders WHERE OrderStatus = %s"
    (query, (status,))
    result = ()
    ()
    ()
    return result
# Perform parallel querydef fetch_orders():
    statuses = ['Completed', 'Pending']  # Define the order status we need to query    # Use ThreadPoolExecutor to query in parallel    with ThreadPoolExecutor(max_workers=2) as executor:
        # Submit query task        futures = [(query_orders_by_status, status) for status in statuses]
        # Get query results        results = [() for future in futures]
    return results
# Example: Execute queryif __name__ == "__main__":
    start_time = ()
    orders = fetch_orders()
    print("Query results:", orders)
    print(f"Query time: {() - start_time}Second")

Code description

query_orders_by_status: This method performs a database query to query orders with specified status.
fetch_orders: This method is usedThreadPoolExecutorTo execute multiple query tasks in parallel. Here we will order statusCompletedandPendingSubmit it as tasks to the thread pool in parallel query.
ThreadPoolExecutor: We created a thread pool with a maximum number of worker threads of 2 and usedsubmitSubmit query task. Each query will be executed in a separate thread.
(): Get the result of the parallel query task return.

2. Concurrent query

We can execute concurrent queries through asynchronous queries or multi-threading, which are suitable for situations where database queries do not depend on each other.

import asyncio
import 
from  import ThreadPoolExecutor
# Asynchronous query of databaseasync def query_orders_by_status_async(status, loop):
    # Use ThreadPoolExecutor to enable database queries to execute asynchronously    result = await loop.run_in_executor(None, query_orders_by_status, status)
    return result
# Execute query: Query orders whose order status is in the specified statedef query_orders_by_status(status):
    connection = get_db_connection()
    cursor = (dictionary=True)
    query = "SELECT * FROM Orders WHERE OrderStatus = %s"
    (query, (status,))
    result = ()
    ()
    ()
    return result
# Asynchronous concurrent queryasync def fetch_orders_concurrently():
    loop = asyncio.get_event_loop()
    statuses = ['Completed', 'Pending', 'Shipped']  # Query orders with multiple statuses    tasks = [query_orders_by_status_async(status, loop) for status in statuses]
    orders = await (*tasks)  # Wait for all tasks to complete    return orders
# Example: Perform concurrent queryif __name__ == "__main__":
    start_time = ()
    (fetch_orders_concurrently())
    print(f"Query time: {() - start_time}Second")

Code description

query_orders_by_status_async: This method is usedloop.run_in_executorTo asynchronize database query operations. In this way, although the database query is a blocking operation, we can execute multiple queries concurrently.
: Combine multiple asynchronous tasks together and wait for all tasks to complete before returning the result.
: Used to start the event loop and execute asynchronous queries.

Further optimization

Thread pool size: Adjust according to business needsThreadPoolExecutorInmax_workersParameters. If there are many tasks, you can increase the thread pool size appropriately, but be careful not to too many to avoid affecting system performance.
Connection pool: For database operations, you can use the database connection pool to optimize the management of database connections. This can prevent new database connections from being established every query and improve performance.
Pagination query: If the query results are very large, you can reduce the amount of data per query through paging queries to further improve performance.

Summarize

Parallel query: By splitting the query task into multiple subtasks and processing it in parallel, query performance can be significantly improved.
Concurrent query: Suitable for concurrent execution between multiple query tasks, without waiting for each query task to complete one by one, which can speed up the overall query speed.

By combining parallel query and concurrent query strategies, we can significantly improve the query response speed of e-commerce platforms or other business systems, especially in a highly concurrency environment, ensuring the efficiency of the system.

11. SQL Server instance optimization

Regularly restart SQL Server instances: If SQL Server is running for a long time, it may cause problems such as excessive cache or memory leaks, periodic restarts can help free up resources and optimize performance.
Enable compression: SQL Server provides data compression capabilities, which can save storage space and improve query performance, especially when reading data.

SQL Server instance optimization is an important aspect of improving the overall performance of the database. In large business systems, the performance of SQL Server often directly affects the response speed and stability of the entire application. Instance optimization includes the rational configuration of hardware resources, the optimization of SQL Server configuration parameters, memory and I/O management, query optimization, and monitoring.

Suppose we have an online e-commerce platform with a large business volume and contains a large amount of product, orders, users and other data. We need to optimize SQL Server instances to ensure efficient query performance, stable transaction processing and fast data reading capabilities.

1. Hardware configuration optimization

The performance of SQL Server instances depends to a large extent on the configuration of the underlying hardware, especially memory, CPU, disk and other resources.

Memory: SQL Server is a memory-intensive application. The larger the memory, the higher the cache hit rate and the better the query performance.
CPU: More CPU cores can handle more concurrent requests.
disk: SSD drives are better than traditional hard disks in terms of disk I/O performance, especially in read and write operations of large databases.

2. SQL Server configuration optimization

SQL Server provides many configuration parameters to adjust the behavior of an instance, which can be used to optimize performance.

Configuration Parameter Example

max degree of parallelism: Controls the parallelism of SQL Server queries. By reasonably setting the parallelism, the query efficiency of multi-core CPU systems can be improved.
max server memory: Limit the maximum amount of memory used by SQL Server to prevent SQL Server from taking up too much memory and causing operating system performance to decline.
cost threshold for parallelism: Sets the cost threshold for query execution. SQL Server will only use parallel execution when the cost of the query exceeds this value.

3. Index optimization

Indexing is the key to improving query performance. You can create indexes for frequently queried fields based on business scenarios. But too many indexes can affect the performance of insert, update and delete operations, so a balance between query performance and maintenance costs need to be found.

4. Query optimization

For large business systems, query optimization is particularly important. Optimizing queries can reduce the burden on the database and improve response speed.

Business scenarios

Assuming that e-commerce platforms need to process a large amount of order data, querying often involves joining multiple tables, such as querying all orders of a user within a certain time period. We can improve query speed by optimizing SQL queries.

Code Example

1. Set SQL Server instance configuration parameters

In the SQL Server instance, we can set some basic optimization parameters through the following T-SQL statements:

-- Set the maximum memory usage to 16 GB
EXEC sp_configure 'max server memory', 16384;  -- unit：MB
RECONFIGURE;
-- Set the maximum parallelism 8 nuclear CPU
EXEC sp_configure 'max degree of parallelism', 8;
RECONFIGURE;
-- Set the cost threshold for the query to 10
EXEC sp_configure 'cost threshold for parallelism', 10;
RECONFIGURE;

2. Query optimization

To improve query performance, you can use the following tips when querying:

Avoid SELECT * and select only the fields you want.
Use JOIN instead of subqueries to avoid unnecessary nested queries.
Create appropriate indexes to speed up queries.
Use paging queries to reduce the amount of data in a single query.

Here is an optimized query example:

-- Suppose we need to query the order information of a certain user，Optimized SQL Query
SELECT , , , 
FROM Orders o
JOIN Users u ON  = 
WHERE  BETWEEN '2024-01-01' AND '2024-12-31'
  AND  = 12345
ORDER BY  DESC;

3. Index optimization

To optimize the query, we canOrdersTableUserID、OrderDateCreate an index on the field:

-- for UserID Column creation index
CREATE INDEX idx_user_id ON Orders(UserID);
-- for OrderDate Column creation index
CREATE INDEX idx_order_date ON Orders(OrderDate);
-- for UserID and OrderDate Combination creates composite index
CREATE INDEX idx_user_order_date ON Orders(UserID, OrderDate);

4. Database backup and maintenance

Regular backups and maintenance of databases ensure that the system remains efficient under high loads. Regular database optimization tasks include:

Back up the data.
Update statistics.
Rebuild the index.

Here is an example of periodic rebuilding of indexes:

-- Rebuild all tables indexes
ALTER INDEX ALL ON Orders REBUILD;
ALTER INDEX ALL ON Users REBUILD;

5. Use SQL Server's performance monitoring tool

SQL Server provides performance monitoring tools to help identify performance bottlenecks. For example,SQL Server ProfilerandDynamic Management Views (DMVs)It can help us monitor the performance of SQL Server instances in real time and tune them according to actual conditions.

-- Check SQL Server Current resource usage of the instance
SELECT * FROM sys.dm_exec_requests;
-- Check SQL Server Memory usage of the instance
SELECT * FROM sys.dm_os_memory_clerks;
-- Check SQL Server The disk of the instance I/O Usage
SELECT * FROM sys.dm_io_virtual_file_stats(NULL, NULL);

Let's summarize

Hardware optimization: Rationally configure CPU, memory and disk to improve the performance of SQL Server instances.
Instance configuration optimization: Optimize performance by configuring SQL Server parameters, such as memory limits, parallelism, etc.
Index optimization: Reasonably design the index structure and improve query efficiency.
Query optimization: Use efficient SQL query statements to avoid unnecessary calculations and I/O operations.
Regular maintenance and backup: Regularly carry out database maintenance and backup to ensure stable system operation.

By optimizing SQL Server instances, the performance of the database can be significantly improved, ensuring that e-commerce platforms can still maintain efficient responses under high concurrency and high load conditions.

at last

The above 11 optimization solutions are for your reference. To optimize the performance of SQL Server database, you must start from multiple aspects, including hardware configuration, database structure, query optimization, index management, partitioned tables, parallel processing, etc. Through reasonable indexing, query optimization, data partitioning and other technologies, good performance can be maintained when the data volume increases. At the same time, regularly maintain and clean the database to ensure efficient operation of the database. Follow Brother Wei loves programming, Brother V is your technical technician.

This is the end of this article about how to optimize SQL Server data. For more related SQL Server data optimization content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!

SQL Server: Methods to optimize too much data

1. Index optimization

Business scenarios

1. Create a tableOrders

2. Create an index

2.1. Create a Clustered Index

2.2. Create a non-clustered index

2.3. Create a single column nonclustered index

3. Delete redundant indexes

4. Query optimization

4.1. PressCustomerIDandOrderDateQuery

4.2. PressProductIDQuery

4.3. Query specific order details

5. Things to note

Let's summarize

2. Query optimization

Business scenarios

1. Query optimization: Query orders by CustomerID and OrderDate

Query requirements:

Query statement:

Optimization suggestions:

Execution plan optimization:

2. Query optimization: Query all relevant orders by ProductID

Query requirements:

Query statement:

Optimization suggestions:

Execution plan optimization:

3. Query optimization: Query the detailed information of a certain order

Query requirements:

Query statement:

Optimization suggestions:

Execution plan optimization:

4. Query optimization: Query order information of multiple customers

Query requirements:

Query statement:

Optimization suggestions:

Execution plan optimization:

5. Query optimization: Avoid usingSELECT *

Query requirements:

Query statement:

Optimization suggestions:

6. Query optimization: UseJOINPerform multi-table query

Query requirements:

Query statement:

Optimization suggestions:

Execution plan optimization:

7. Query optimization: paginated query

Query requirements:

Query statement:

Optimization suggestions:

8. Avoid too many subqueries

Query requirements:

Query statement:

Optimization suggestions:

Let's summarize

3. Data partitioning and table

Business scenarios

1. Data partitioning (partitioning)

Business Requirements

step:

Create a Partition Function

Create a Partition Scheme

Create a partition table

Query optimization

2. Data sub-table (Sharding)

Business Requirements

step:

Create a subtitle

Table logic

Query logic

3. Selection of partitions and tables

Let's summarize

4. Data Archive

Business scenarios

step:

1. Create the main table and archive table

2. Archive operation (move orders over 1 year to the archive table)

3. Timed archive tasks (using SQL Server Agent)

4. Query archived data

5. Optimization and precautions

1. Create a table`Orders`

4.1. Press`CustomerID`and`OrderDate`Query

4.2. Press`ProductID`Query

5. Query optimization: Avoid using`SELECT *`

6. Query optimization: Use`JOIN`Perform multi-table query