I received a request today to rewrite the logic of a report. After the modification, I ran it again and found that the run timed out.
Due to special reasons, we cannot access the client's server, we cannot view the SQL execution plan, and we cannot know the index status in the table, so we try to optimize from the rewriting of the statement.
1. The original statement is as follows:
select isnull(vv.customer_id,v.customer_id) as customer_id, isnull(vv.business_date,replace(v.business_date,'-','')) as business_date, v.prod_id, , vv.visit_count, v.all_sales from ( SELECT a.customer_id , max(month)+'-01' as business_date, a.PROD_ID , SUM(CAST(VALUE AS NUMERIC(38, 3))) sales, sum(SUM(CAST(VALUE AS NUMERIC(38, 3)))) over(partition by a.customer_id) as all_sales FROM TB_IMPORT_SALES a WHERE a.customer_id IS NOT NULL AND a.PROD_ID IS NOT NULL and ='2016-11' GROUP BY a.customer_id , a.PROD_ID )v full join ( SELECT customer_id, max(a.business_date) as business_date, COUNT(*) AS VISIT_COUNT FROM TB_CALL_STORE a WITH(NOLOCK) inner join TB_TIME d on a.business_date = d.t_date where ='2016-11' GROUP BY customer_id )vv on v.customer_id = vv.customer_id
It turned out to be a left join. Although the query is slow, it can be found in 2 minutes. Now, according to the business requirements, you need to see all the data, so it has been changed to a full join. After the change, the result cannot be found in 5 minutes.
2. Rewritten code
select v.customer_id, replace(max(v.business_date),'-','') as business_date, v.prod_id, max(v.sales_volume) sales_volume , max(v.visit_count) visit_count, max(v.all_sales_volume) all_sales_volume from ( SELECT a.customer_id , max(biz_month)+'-01' as business_date, a.PROD_ID , SUM(CAST(VALUE1 AS NUMERIC(38, 8))) sales_volume, sum(SUM(CAST(VALUE1 AS NUMERIC(38, 8)))) over(partition by a.customer_id) as all_sales_volume, null as visit_count FROM TB_IMPORT_SALES a WHERE a.customer_id IS NOT NULL AND a.PROD_ID IS NOT NULL and ='2016-11' GROUP BY a.customer_id , a.PROD_ID union all SELECT customer_id, max(a.business_date) as business_date, p.prod_id, null, null, COUNT(*) AS VISIT_COUNT FROM TB_CALL_STORE a WITH(NOLOCK) cross apply ( select top 1 prod_id from TB_PRODUCT with(nolock) )p inner join TB_TIME d on a.business_date = d.t_date where ='2016-11' GROUP BY customer_id,p.prod_id )v group by v.customer_id, v.prod_id
Since the code itself is relatively simple, it is impossible to further simplify it. Since the server cannot be connected, other methods cannot be used, and it is even impossible to analyze what is causing it to run so slowly.
After thinking about it, full join is essentially a 2-time left join+union, which is nothing more than merging data. So I tried using union all to directly merge the data. Now I change it to unoin all. Finally, there is no need for full join.
However, considering that the second code does not have the prod_id field, so the second code here adds the cross apply to the id of a product, so that the prod_id field can be merged.
After modification, the speed dropped to more than 10 seconds.
This is the article about SQL optimization practice. Changing full join to left join +union all (down from 5 minutes to 10 seconds). For more related left join +union all content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!