SoFunction
Updated on 2025-03-02

Interpreting the problem of using NULL conditions in GROUP BY and HAVING clauses in SQL

Use NULL conditions in GROUP BY and HAVING clauses in SQL

  • SQL :
SELECT COUNT(1), tid, sap_do, batch, skucode,batch 
FROM lt 
WHERE IFNULL(sap_do, '') != '' 
AND created_time >= 1722441600 
GROUP BY tid, sap_do, skucode 
HAVING COUNT(1) > 1 
AND batch IS NULL; 

The query result is empty; the actual database has batch IS NULL

  • After modification:
SELECT COUNT(1), tid, sap_do, skucode, batch 
FROM lt 
WHERE IFNULL(sap_do, '') != '' 
AND created_time >= 1722441600 
GROUP BY tid, sap_do, skucode 
HAVING COUNT(1) > 1 
AND SUM(CASE WHEN batch IS NULL THEN 1 ELSE 0 END) > 0;

The core of the problem lies in how the GROUP BY and HAVING clauses work, and the handling rules of NULL

1. HAVING batch IS NULL Reasons for No Data

When you use GROUP BY and use batch IS NULL directly in the HAVING clause, SQL will look for whether the batch value is NULL in the grouped results. However, since GROUP BY aggregates all data within the same group, the batch field may contain a mixture of NULL and non-NULL values.

For GROUP BY, it does not simply retain a specific value of the batch field, so if the batch value in a group has both NULL and non-NULL, the batch IS NULL condition will be invalid because SQL cannot determine whether the batch value of this group is NULL or non-NULL.

Direct use of HAVING batch IS NULL usually only takes effect if the batches of the entire group are all NULL.

In short, HAVING batch IS NULL cannot filter out results when the batch column is not all NULL after grouping.

2. HAVING SUM(CASE WHEN batch IS NULL THEN 1 ELSE 0 END) > 0 Reason for the correct result

The logic of SUM(CASE WHEN batch IS NULL THEN 1 ELSE 0 END) is to calculate the number of records in a group whose batch is NULL through the CASE statement.

SUM processes every record in the entire group, so even if there is a non-NULL value of the batch in the group, as long as the batch of any record is NULL, SUM can correctly calculate the number of NULL values ​​in this group.

explain:

  • SUM(CASE WHEN batch IS NULL THEN 1 ELSE 0 END) > 0: Through this SUM, we calculate how many records in the current group have NULL batches. If the result is greater than 0, it means that there is a batch with a NULL value in this group.
  • HAVING batch IS NULL: Only when the value of batch after grouping is completely NULL, it is impossible to deal with groups containing NULL and non-NULL values.

Summarize

HAVING batch IS NULL can only handle the case where batches in the entire group are NULL, which will result in empty results in packets with mixed values.

HAVING SUM(CASE WHEN batch IS NULL THEN 1 ELSE 0 END) > 0 can handle cases where NULL and non-NULL are included in groups, so it correctly returns a group containing NULL values.

This SUM(CASE WHEN…) logic provides finer granular control to ensure that even if there are multiple different batch values ​​in the group, those containing NULLs can be correctly identified.

Replenish

If you query the id field, the SQL result above will find that the id result is not the row ID with the batch of null we want

The SQL can be modified to execute:

SELECT id, tid, sap_do, skucode, batch 
FROM lt
WHERE batch IS NULL
AND (tid, sap_do, skucode) IN (
    SELECT tid, sap_do, skucode
    FROM lt 
    WHERE IFNULL(sap_do, '') != ''
    AND created_time >= 1725120000
    GROUP BY tid, sap_do, skucode
    HAVING COUNT(1) > 1
    AND SUM(CASE WHEN batch IS NULL THEN 1 ELSE 0 END) > 0
);

The above is personal experience. I hope you can give you a reference and I hope you can support me more.