SoFunction
Updated on 2025-04-07

sql optimization practice changes full join to left join +union all (download from 5 minutes to 10 seconds)

I received a request today to rewrite the logic of a report. After the modification, I ran it again and found that the run timed out.

Due to special reasons, we cannot access the client's server, we cannot view the SQL execution plan, and we cannot know the index status in the table, so we try to optimize from the rewriting of the statement.

1. The original statement is as follows:

select isnull(vv.customer_id,v.customer_id) as customer_id,
		isnull(vv.business_date,replace(v.business_date,'-','')) as business_date,
		v.prod_id,
		,
		vv.visit_count,
    v.all_sales
from 
(
  SELECT a.customer_id ,
	    max(month)+'-01' as business_date,
      a.PROD_ID ,
      SUM(CAST(VALUE AS NUMERIC(38, 3))) sales,
      sum(SUM(CAST(VALUE AS NUMERIC(38, 3)))) over(partition by a.customer_id) as all_sales
							
  FROM  TB_IMPORT_SALES a 
  WHERE  a.customer_id IS NOT NULL
      AND a.PROD_ID IS NOT NULL
			and ='2016-11'
  GROUP BY a.customer_id ,
      a.PROD_ID
)v
full join
(
  SELECT customer_id, 
	    max(a.business_date) as business_date,
      COUNT(*) AS VISIT_COUNT 
	FROM TB_CALL_STORE a WITH(NOLOCK)
	inner join TB_TIME d
	on a.business_date = d.t_date 
	where  ='2016-11'
	GROUP BY customer_id
)vv
on v.customer_id = vv.customer_id

It turned out to be a left join. Although the query is slow, it can be found in 2 minutes. Now, according to the business requirements, you need to see all the data, so it has been changed to a full join. After the change, the result cannot be found in 5 minutes.

2. Rewritten code

select v.customer_id,
		replace(max(v.business_date),'-','') as business_date,
		v.prod_id,
		max(v.sales_volume) sales_volume ,
		max(v.visit_count) visit_count,
        max(v.all_sales_volume) all_sales_volume
from 
(
  SELECT a.customer_id ,
	    max(biz_month)+'-01' as business_date,
      a.PROD_ID ,
      SUM(CAST(VALUE1 AS NUMERIC(38, 8))) sales_volume,
      sum(SUM(CAST(VALUE1 AS NUMERIC(38, 8)))) over(partition by a.customer_id) as all_sales_volume,
			null as visit_count
							
  FROM  TB_IMPORT_SALES a 
  WHERE  a.customer_id IS NOT NULL
      AND a.PROD_ID IS NOT NULL
			and ='2016-11'
  GROUP BY a.customer_id ,
       a.PROD_ID
  union all
 
  SELECT customer_id, 
	    max(a.business_date) as business_date,
		  p.prod_id,
		  null,
		  null,
      COUNT(*) AS VISIT_COUNT 
	FROM TB_CALL_STORE a WITH(NOLOCK)
	cross apply
	(
		select top 1 prod_id from TB_PRODUCT with(nolock)
	)p
	inner join TB_TIME d
	on a.business_date = d.t_date 
	where  ='2016-11'
	GROUP BY customer_id,p.prod_id
)v
group by v.customer_id,
     v.prod_id

Since the code itself is relatively simple, it is impossible to further simplify it. Since the server cannot be connected, other methods cannot be used, and it is even impossible to analyze what is causing it to run so slowly.

After thinking about it, full join is essentially a 2-time left join+union, which is nothing more than merging data. So I tried using union all to directly merge the data. Now I change it to unoin all. Finally, there is no need for full join.

However, considering that the second code does not have the prod_id field, so the second code here adds the cross apply to the id of a product, so that the prod_id field can be merged.

After modification, the speed dropped to more than 10 seconds.

This is the article about SQL optimization practice. Changing full join to left join +union all (down from 5 minutes to 10 seconds). For more related left join +union all content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!